papiex version : 1.2.6 papiex build : Feb 26 2018/16:07:35 Executable : /home/tushar/mpiwave/mpi_wave Processor : Intel(R) Xeon(R) CPU X5650 @ 2.67GHz Clockrate (MHz) : 2660.000000 Hostname : gulftown Options : MULTIPLEX,MEMORY,PAPI_TOT_INS,PAPI_FP_INS,PAPI_LST_INS,PAPI_BR_INS,PAPI_LD_INS,PAPI_SR_INS,PAPI_TOT_CYC,PAPI_RES_STL,PAPI_L1_DCM,PAPI_L1_ICM,PAPI_TLB_DM,PAPI_TLB_IM,PAPI_L2_DCM,PAPI_L2_ICM,PAPI_BR_CN,PAPI_BR_MSP,PAPI_FP_OPS,PAPI_L2_DCA,PAPI_L2_ICA,NEXTGEN Domain : User Parent process id : 4938 Process id : 4954 Start : Tue Feb 27 15:52:17 2018 Finish : Tue Feb 27 15:52:23 2018 Num. of tasks : 16 Global derived data: MFLOPS wallclock ............................. 1.90795e-01 MFLOPS ....................................... 5.96788e-03 IPC .......................................... 6.94317e-01 Flops Per Load Store ......................... 9.09343e-05 Flops Per L1 Data Cache Miss ................. 1.26560e-02 Load Store Ratio ............................. 2.21776e+00 Instructions Per Dcache Miss ................. 2.41626e+02 Time: Wallclock seconds ............................ 6.11238e+00 IO seconds ................................... 4.94118e-02 Resource Stall seconds ....................... 7.00915e-01 FP Stall seconds ............................. 4.38425e-04 Cycles: Cycles In Domain ............................. 3.20677e+10 Real Cycles .................................. 5.21106e+11 Running Time In Domain % ..................... 6.15377e+00 Virtual Cycles ............................... 2.42957e+11 IO Cycles % .................................. 2.52224e-02 MPI Cycles % ................................. 4.63798e+01 MPI Sync Cycles % ............................ 0.00000e+00 Instructions: Total Instructions ........................... 2.22651e+10 Memory Instructions % ........................ 5.76002e+01 Memory Instructions % ........................ 5.76002e+01 FP Instructions % ............................ 5.17285e-03 Branch Instructions % ........................ 2.32254e+01 Memory: Load Store Ratio ............................. 2.21776e+00 L1 Data Misses Per 1000 Load Stores .......... 7.18509e+00 L1 Data Misses Per 1000 Load Stores .......... 7.18509e+00 L1 Instruction Misses Per 1000 Instructions .. 1.45460e+00 L2 Data Misses Per 1000 L2 Load Stores ....... 7.31839e+01 L2 Instruction Misses Per 1000 L2 Instructions 2.87965e+02 Data TLB Misses Per 1000 Load Stores ......... 7.45501e-02 Instruction TLB Misses Per 1000 Instructions . 3.54490e-01 L1 Bandwidth MBytes per second ............... 1.67853e+04 Stalls: Resource Stall Cycles % ...................... 5.81405e+00 Branch Misprediction % ....................... 7.86574e-03 Arch Parameters: L2 LATENCY IN CYCLES ......................... 1.00000e+01 L3 LATENCY IN CYCLES ......................... 4.00000e+01 MEMORY LATENCY IN CYCLES ..................... 2.00000e+02 WORD SIZE IN BYTES ........................... 8.00000e+00 ------------------------------------------------------------------------------- Default spec file ($Id: papi 359 2012-01-06 09:21:17Z tushar $) Metric Descriptions: Unless mentioned otherwise, counts are accumulated across sub-processes/threads MFLOPS wallclock : Millions of floating point ops per *wallclock* second PAPI_FP_OPS/WALL_CLOCK_USEC MFLOPS : Millions of FP ops per second PAPI_FP_OPS / Real_usecs IPC : Instructions retired per cycle PAPI_TOT_INS / PAPI_TOT_CYC Flops Per Load_Store : PAPI_FP_OPS / PAPI_LST_INS Flops Per L1 Data Cache Miss : PAPI_FP_OPS / PAPI_L1_DCM Load Store Ratio : Ratio of loads to stores. PAPI_LD_INS / PAPI_SR_INS Instructions Per Dcache Miss : PAPI_TOT_INS / PAPI_L1_DCM Wallclock seconds : Unhalted wallclock time. Never counted twice. WALL_CLOCK_USEC / 1000000 IO seconds : Time spent in seconds doing I/O. This includes any time in I/O, including time outside domain, when the process is waiting for I/O to complete. IO cycles / (Clock Hz) No Issue Stall seconds : Time in domain with no instruction issue. This would not include cycles outside domain such as system time or I/O time or time process was not scheduled to run. PAPI_STL_ICY / (Clock Hz) Resource Stall seconds : Time in seconds stalled on any resource. Resource can be an integer or FP register or reservation station. It would not include stalls waiting for memory operands. PAPI_RES_STL / (Clock Hz) FP Stall seconds : Floating-point stall seconds. PAPI_FP_OPS / (Clock Hz) Memory Stall seconds : Time in seconds waiting for memory operations. PAPI_MEM_SCY / (Clock Hz) Max L1 Miss L2 Hit Stall sec : Maximum time in stalls on L1 miss/L2 hits. This is an estimate calculated using L1 and L2 cache hits and the L2 access latency. The actual latency may have been masked by hit-under-miss or instruction scheduling. (PAPI_LI_DCM-PAPI_L2_DCM)*L2_LATENCY/(Clock Hz) Max L2 Miss L3 Hit Stall sec : Maximum time in stalls on L2 miss/L3 hits. This is an estimate calculated using L2 and L3 cache hits and the L3 access latency. The actual latency may have been masked by hit-under-miss or instruction scheduling. (PAPI_L2_DCM-PAPI_L3_DCM)*L3_LATENCY/(Clock Hz) Max Memory Access Stall_sec : Maximum time in stalls waiting on memory This is an estimate calculated using L3 cache misses and the L3 access latency. The actual latency may have been masked by hit-under-miss or instruction scheduling. PAPI_L3_DCM * MEM_LATENCY / (Clock Hz) Cycles In Domain : Total processor cycles in the PAPI domain PAPI_TOT_CYC. Note, this cycle counter is more granular and accurate than Real or Virtual cycles counter. This may lead to situations where this value is measured as higher than even Real cycles. Real Cycles : Always counted, unhalted. Running Time In Domain % : Percent of processor time spent in the domain 100 * PAPI_TOT_CYC / Real cycles Virtual Cycles : Counted only when executing on the processor. IO Cycles % : Percent of cycles spent in I/O 100 * IO cycles / Real cycles MPI Cycles % : Percent of cycles spent in MPI 100 * MPI cycles / Real cycles MPI Sync Cycles % : Percent of cycles spent in MPI sync ops 100 * MPI Sync cycles / Real cycles Thread Sync Cycles % : Percent of cycles spent in thread synchronization 100 * Thr Sync cycles / Real cycles Total Instructions : Completed instructions PAPI_TOT_INS Memory Instructions % : Percent of instructions that are memory : 100 * PAPI_LST_INS / PAPI_TOT_INS or 100 * (PAPI_LD_INS + PAPI_SR_INS) / PAPI_TOT_INS FP Instructions % : Percent of instructions that are floating point 100 * PAPI_FP_INS / PAPI_TOT_INS FP Instructions % approx : Approximate FP instruction percent, using FP ops instead of instructions 100 * PAPI_FP_OPS / PAPI_TOT_INS Branch Instructions % : Percent of instructions that are branches 100 * PAPI_BR_INS / PAPI_TOT_INS Integer Instructions % : Percent of instructions that are of integer type 100 * PAPI_INT_INS / PAPI_TOT_INS Load Store Ratio : Ratio of loads to stores PAPI_LD_INS / PAPI_SR_INS L1 Data Misses Per 1000 LD/ST : L1 data misses per thousand L1 data references 1000 * PAPI_L1_DCM / $PAPI_LST_INS or 1000 * PAPI_L1_DCM / PAPI_L1_DCA L1 Instruction Misses Per 1000: L1 I-cache misses per thousand instructions 1000 * PAPI_L1_ICM / PAPI_TOT_INS L2 Data Misses Per 1000 : L2 data cache misses per thousand L2 data references 1000 * PAPI_L2_DCM / PAPI_L2_DCA L2 Instruction Misses Per 1000: L2 instruction cache misses per thousand L2 I-cache references 1000 * PAPI_L2_ICM / PAPI_L2_ICA Data TLB Misses Per 1000 LD/ST: D-TLB misses per thousand load stores 1000 * PAPI_TLB_DM / PAPI_LST_INS Ins. TLB Misses Per 1000 ins. : I-TLB misses per thousand instructions 1000 * PAPI_TLB_IM / PAPI_TOT_INS L1 Bandwidth MBytes per second: Effective cumulative L1 bandwidth achieved (PAPI_LD_INS + $PAPI_SR_INS) * WORD_SIZE / Wallclock usecs Resource Stall Cycles % : Percent of total cycles stalled for any resource 100 * PAPI_RES_STL / PAPI_TOT_CYC Memory Stall Cycles % : Percent of total cycles stalled for memory ops 100 * PAPI_MEM_SCY / PAPI_TOT_CYC FP Stall Cycles % : Percent of total cycles stalled for FP ops 100 * PAPI_FP_STAL / PAPI_TOT_CYC No Issue Cycle % : Percent of cycles with no issue 100 * PAPI_STL_ICY / PAPI_TOT_CYC Full Issue Cycle % : Percent of cycles with full issue 100 * PAPI_FUL_ICY / PAPI_TOT_CYC FPU Idle Cycle % : Percent of cycles where the FP unit was idle 100 * PAPI_FPU_IDL / PAPI_TOT_CYC LSU Idle Cycle % : Percent of cycles the load-store unit was idle 100 * PAPI_LSU_IDL / PAPI_TOT_CYC Branch Misprediction % : Percent of mispredicted branches 100 * PAPI_BR_MSP / PAPI_BR_INS ------------------------------------------------------------------------------- Global counts data: Event Sum Min Max Mean CV IO cycles .................................... 1.31435e+08 1.61933e+06 1.23498e+07 8.21470e+06 5.09044e-01 MPI Sync cycles .............................. 0.00000e+00 MPI cycles ................................... 2.41688e+11 5.45499e+07 1.61163e+10 1.51055e+10 2.57267e-01 Mem. heap KB ................................. 2.03140e+05 1.26880e+04 1.28200e+04 1.26962e+04 2.51666e-03 Mem. library KB .............................. 1.42912e+05 8.93200e+03 8.93200e+03 8.93200e+03 0.00000e+00 Mem. locked KB ............................... 0.00000e+00 Mem. resident peak KB ........................ 2.18444e+05 1.35160e+04 1.43240e+04 1.36528e+04 1.40227e-02 Mem. shared KB ............................... 1.04796e+05 6.46400e+03 6.68400e+03 6.54975e+03 9.35446e-03 Mem. stack KB ................................ 7.38400e+03 4.52000e+02 4.68000e+02 4.61500e+02 9.12909e-03 Mem. text KB ................................. 1.28000e+02 8.00000e+00 8.00000e+00 8.00000e+00 0.00000e+00 Mem. virtual peak KB ......................... 0.00000e+00 PAPI_BR_CN ................................... 2.91899e+09 1.23244e+07 2.09682e+08 1.82437e+08 2.45820e-01 PAPI_BR_INS .................................. 5.17116e+09 1.86565e+07 3.68398e+08 3.23198e+08 2.48730e-01 PAPI_BR_MSP .................................. 4.06750e+05 7.82800e+03 7.04680e+04 2.54219e+04 8.30503e-01 PAPI_FP_INS .................................. 1.15174e+06 3.00000e+00 1.15170e+06 7.19839e+04 3.87282e+00 PAPI_FP_OPS .................................. 1.16621e+06 3.00000e+00 1.16617e+06 7.28882e+04 3.87282e+00 PAPI_L1_DCM .................................. 9.21470e+07 9.76134e+05 8.70715e+06 5.75919e+06 4.01360e-01 PAPI_L1_ICM .................................. 3.23868e+07 5.80273e+05 4.63475e+06 2.02417e+06 5.39519e-01 PAPI_L2_DCA .................................. 9.21470e+07 9.76134e+05 8.70715e+06 5.75919e+06 4.01360e-01 PAPI_L2_DCM .................................. 6.74368e+06 1.22628e+05 1.12168e+06 4.21480e+05 6.64362e-01 PAPI_L2_ICA .................................. 2.49389e+06 9.77630e+04 4.18479e+05 1.55868e+05 4.51171e-01 PAPI_L2_ICM .................................. 7.18155e+05 5.51000e+02 8.87580e+04 4.48847e+04 6.52929e-01 PAPI_LD_INS .................................. 8.83915e+09 1.79181e+07 6.35227e+08 5.52447e+08 2.54676e-01 PAPI_LST_INS ................................. 1.28248e+10 2.69933e+07 9.20272e+08 8.01547e+08 2.54441e-01 PAPI_RES_STL ................................. 1.86443e+09 1.55340e+07 1.81292e+08 1.16527e+08 3.64752e-01 PAPI_SR_INS .................................. 3.98561e+09 9.07517e+06 2.85045e+08 2.49101e+08 2.54034e-01 PAPI_TLB_DM .................................. 9.56087e+05 2.33280e+04 1.00804e+05 5.97554e+04 3.24495e-01 PAPI_TLB_IM .................................. 7.89276e+06 4.70300e+03 7.45376e+06 4.93298e+05 3.64328e+00 PAPI_TOT_CYC ................................. 3.20677e+10 9.65905e+07 2.44788e+09 2.00423e+09 2.54396e-01 PAPI_TOT_INS ................................. 2.22651e+10 8.53873e+07 1.59859e+09 1.39157e+09 2.47673e-01 Real cycles .................................. 5.21106e+11 3.25559e+10 3.25905e+10 3.25691e+10 3.50867e-04 Real usecs ................................... 1.95415e+08 1.22085e+07 1.22214e+07 1.22134e+07 3.50744e-04 Virtual cycles ............................... 2.42957e+11 1.69785e+08 1.62038e+10 1.51848e+10 2.55312e-01 Virtual usecs ................................ 9.13371e+07 6.38310e+04 6.09165e+06 5.70857e+06 2.55312e-01 Wallclock usecs .............................. 6.11238e+06 6.11141e+06 6.11718e+06 6.11381e+06 2.99300e-04 Event Descriptions: PAPI_TOT_INS : Instructions completed PAPI_FP_INS : Floating point instructions PAPI_LST_INS : Load/store instructions completed PAPI_BR_INS : Branch instructions PAPI_LD_INS : Load instructions PAPI_SR_INS : Store instructions PAPI_TOT_CYC : Total cycles PAPI_RES_STL : Cycles stalled on any resource PAPI_L1_DCM : Level 1 data cache misses PAPI_L1_ICM : Level 1 instruction cache misses PAPI_TLB_DM : Data translation lookaside buffer misses PAPI_TLB_IM : Instruction translation lookaside buffer misses PAPI_L2_DCM : Level 2 data cache misses PAPI_L2_ICM : Level 2 instruction cache misses PAPI_BR_CN : Conditional branch instructions PAPI_BR_MSP : Conditional branch instructions mispredicted PAPI_FP_OPS : Floating point operations PAPI_L2_DCA : Level 2 data cache accesses PAPI_L2_ICA : Level 2 instruction cache accesses Rank mapping: [0] => gulftown (PID 4948) [1] => gulftown (PID 4950) [10] => gulftown (PID 4960) [11] => gulftown (PID 4956) [12] => gulftown (PID 4961) [13] => gulftown (PID 4947) [14] => gulftown (PID 4962) [15] => gulftown (PID 4951) [2] => gulftown (PID 4959) [3] => gulftown (PID 4949) [4] => gulftown (PID 4958) [5] => gulftown (PID 4957) [6] => gulftown (PID 4953) [7] => gulftown (PID 4954) [8] => gulftown (PID 4952) [9] => gulftown (PID 4955) Rank counts data (by field): IO cycles 1.23498e+07 [7] 1.20589e+07 [9] 1.19056e+07 [4] 1.18139e+07 [2] 1.18088e+07 [3] 1.17208e+07 [11] 1.10399e+07 [10] 1.03319e+07 [1] 1.03163e+07 [13] 1.00010e+07 [14] 5.15384e+06 [6] 5.11803e+06 [12] 2.34450e+06 [0] 2.06152e+06 [15] 1.79099e+06 [8] 1.61933e+06 [5] MPI Sync cycles 0.00000e+00 [8] 0.00000e+00 [11] 0.00000e+00 [1] 0.00000e+00 [5] 0.00000e+00 [6] 0.00000e+00 [13] 0.00000e+00 [7] 0.00000e+00 [12] 0.00000e+00 [2] 0.00000e+00 [9] 0.00000e+00 [4] 0.00000e+00 [14] 0.00000e+00 [3] 0.00000e+00 [10] 0.00000e+00 [0] 0.00000e+00 [15] MPI cycles 1.61163e+10 [15] 1.61129e+10 [6] 1.61128e+10 [12] 1.61125e+10 [7] 1.61116e+10 [1] 1.61112e+10 [3] 1.61107e+10 [13] 1.61073e+10 [14] 1.61065e+10 [9] 1.61064e+10 [4] 1.61063e+10 [2] 1.61055e+10 [8] 1.61050e+10 [5] 1.61043e+10 [10] 1.61040e+10 [11] 5.45499e+07 [0] Mem. heap KB 1.28200e+04 [0] 1.26880e+04 [2] 1.26880e+04 [12] 1.26880e+04 [13] 1.26880e+04 [6] 1.26880e+04 [7] 1.26880e+04 [8] 1.26880e+04 [11] 1.26880e+04 [1] 1.26880e+04 [5] 1.26880e+04 [15] 1.26880e+04 [10] 1.26880e+04 [9] 1.26880e+04 [4] 1.26880e+04 [14] 1.26880e+04 [3] Mem. library KB 8.93200e+03 [2] 8.93200e+03 [12] 8.93200e+03 [7] 8.93200e+03 [13] 8.93200e+03 [6] 8.93200e+03 [1] 8.93200e+03 [5] 8.93200e+03 [11] 8.93200e+03 [8] 8.93200e+03 [15] 8.93200e+03 [0] 8.93200e+03 [10] 8.93200e+03 [3] 8.93200e+03 [4] 8.93200e+03 [9] 8.93200e+03 [14] Mem. locked KB 0.00000e+00 [10] 0.00000e+00 [15] 0.00000e+00 [0] 0.00000e+00 [3] 0.00000e+00 [4] 0.00000e+00 [9] 0.00000e+00 [14] 0.00000e+00 [12] 0.00000e+00 [2] 0.00000e+00 [1] 0.00000e+00 [5] 0.00000e+00 [11] 0.00000e+00 [8] 0.00000e+00 [7] 0.00000e+00 [6] 0.00000e+00 [13] Mem. resident peak KB 1.43240e+04 [0] 1.37920e+04 [11] 1.37560e+04 [10] 1.37080e+04 [8] 1.36440e+04 [2] 1.36440e+04 [5] 1.36320e+04 [9] 1.35960e+04 [7] 1.35800e+04 [6] 1.35680e+04 [15] 1.35560e+04 [3] 1.35360e+04 [13] 1.35360e+04 [4] 1.35320e+04 [12] 1.35240e+04 [14] 1.35160e+04 [1] Mem. shared KB 6.68400e+03 [10] 6.63600e+03 [11] 6.63200e+03 [15] 6.60800e+03 [7] 6.56000e+03 [8] 6.55600e+03 [9] 6.55600e+03 [2] 6.55200e+03 [0] 6.54400e+03 [5] 6.53600e+03 [13] 6.50800e+03 [1] 6.50400e+03 [14] 6.50000e+03 [3] 6.48400e+03 [6] 6.47200e+03 [12] 6.46400e+03 [4] Mem. stack KB 4.68000e+02 [5] 4.68000e+02 [8] 4.64000e+02 [12] 4.64000e+02 [11] 4.64000e+02 [7] 4.64000e+02 [15] 4.64000e+02 [0] 4.64000e+02 [3] 4.60000e+02 [2] 4.60000e+02 [13] 4.60000e+02 [6] 4.60000e+02 [14] 4.60000e+02 [4] 4.56000e+02 [1] 4.56000e+02 [9] 4.52000e+02 [10] Mem. text KB 8.00000e+00 [2] 8.00000e+00 [12] 8.00000e+00 [7] 8.00000e+00 [6] 8.00000e+00 [13] 8.00000e+00 [5] 8.00000e+00 [1] 8.00000e+00 [8] 8.00000e+00 [11] 8.00000e+00 [15] 8.00000e+00 [0] 8.00000e+00 [10] 8.00000e+00 [3] 8.00000e+00 [14] 8.00000e+00 [4] 8.00000e+00 [9] Mem. virtual peak KB 0.00000e+00 [15] 0.00000e+00 [0] 0.00000e+00 [10] 0.00000e+00 [3] 0.00000e+00 [4] 0.00000e+00 [9] 0.00000e+00 [14] 0.00000e+00 [2] 0.00000e+00 [12] 0.00000e+00 [7] 0.00000e+00 [6] 0.00000e+00 [13] 0.00000e+00 [1] 0.00000e+00 [5] 0.00000e+00 [8] 0.00000e+00 [11] PAPI_BR_CN 2.09682e+08 [6] 2.08833e+08 [4] 2.01684e+08 [7] 2.01297e+08 [3] 2.00023e+08 [13] 1.98271e+08 [11] 1.92745e+08 [5] 1.92410e+08 [10] 1.90818e+08 [1] 1.90477e+08 [2] 1.89582e+08 [12] 1.88444e+08 [15] 1.87032e+08 [9] 1.81001e+08 [14] 1.74366e+08 [8] 1.23244e+07 [0] PAPI_BR_INS 3.68398e+08 [4] 3.66387e+08 [6] 3.65286e+08 [3] 3.64591e+08 [13] 3.55126e+08 [7] 3.49287e+08 [11] 3.39404e+08 [15] 3.38904e+08 [12] 3.38748e+08 [1] 3.37186e+08 [10] 3.33743e+08 [5] 3.32106e+08 [14] 3.32028e+08 [9] 3.26389e+08 [2] 3.04923e+08 [8] 1.86565e+07 [0] PAPI_BR_MSP 7.04680e+04 [0] 6.27150e+04 [2] 5.55960e+04 [8] 4.53090e+04 [10] 4.33190e+04 [6] 1.46170e+04 [12] 1.34600e+04 [3] 1.29810e+04 [5] 1.24300e+04 [9] 1.22080e+04 [15] 1.20810e+04 [1] 1.13540e+04 [7] 1.11550e+04 [14] 1.10640e+04 [11] 1.01650e+04 [13] 7.82800e+03 [4] PAPI_FP_INS 1.15170e+06 [0] 3.00000e+00 [8] 3.00000e+00 [11] 3.00000e+00 [5] 3.00000e+00 [1] 3.00000e+00 [6] 3.00000e+00 [13] 3.00000e+00 [7] 3.00000e+00 [12] 3.00000e+00 [2] 3.00000e+00 [14] 3.00000e+00 [4] 3.00000e+00 [9] 3.00000e+00 [3] 3.00000e+00 [10] 3.00000e+00 [15] PAPI_FP_OPS 1.16617e+06 [0] 3.00000e+00 [12] 3.00000e+00 [2] 3.00000e+00 [1] 3.00000e+00 [5] 3.00000e+00 [11] 3.00000e+00 [8] 3.00000e+00 [7] 3.00000e+00 [6] 3.00000e+00 [13] 3.00000e+00 [10] 3.00000e+00 [15] 3.00000e+00 [3] 3.00000e+00 [9] 3.00000e+00 [4] 3.00000e+00 [14] PAPI_L1_DCM 8.70715e+06 [14] 8.64055e+06 [2] 8.40187e+06 [9] 8.38132e+06 [8] 8.05716e+06 [10] 6.90047e+06 [15] 6.86976e+06 [5] 5.65160e+06 [12] 5.47517e+06 [11] 5.15161e+06 [1] 5.07237e+06 [13] 4.18008e+06 [7] 4.14132e+06 [3] 2.89406e+06 [4] 2.64637e+06 [6] 9.76134e+05 [0] PAPI_L1_ICM 4.63475e+06 [11] 4.51893e+06 [8] 2.65746e+06 [4] 2.46971e+06 [2] 2.21140e+06 [5] 2.08322e+06 [10] 1.92304e+06 [9] 1.69784e+06 [1] 1.67675e+06 [6] 1.59021e+06 [14] 1.35446e+06 [15] 1.31576e+06 [12] 1.28927e+06 [7] 1.28845e+06 [13] 1.09528e+06 [3] 5.80273e+05 [0] PAPI_L2_DCA 8.70715e+06 [14] 8.64055e+06 [2] 8.40187e+06 [9] 8.38132e+06 [8] 8.05716e+06 [10] 6.90047e+06 [15] 6.86976e+06 [5] 5.65160e+06 [12] 5.47517e+06 [11] 5.15161e+06 [1] 5.07237e+06 [13] 4.18008e+06 [7] 4.14132e+06 [3] 2.89406e+06 [4] 2.64637e+06 [6] 9.76134e+05 [0] PAPI_L2_DCM 1.12168e+06 [5] 1.01974e+06 [2] 6.83787e+05 [6] 6.22033e+05 [8] 3.35969e+05 [7] 3.28441e+05 [12] 3.27921e+05 [11] 3.18218e+05 [4] 3.00666e+05 [9] 2.94882e+05 [1] 2.66774e+05 [3] 2.65200e+05 [10] 2.59936e+05 [13] 2.46056e+05 [14] 2.29742e+05 [15] 1.22628e+05 [0] PAPI_L2_ICA 4.18479e+05 [0] 1.75177e+05 [13] 1.65811e+05 [10] 1.64853e+05 [8] 1.48034e+05 [5] 1.43058e+05 [12] 1.40359e+05 [6] 1.39803e+05 [3] 1.35717e+05 [1] 1.33145e+05 [2] 1.32074e+05 [14] 1.30099e+05 [7] 1.29389e+05 [4] 1.26826e+05 [11] 1.13307e+05 [15] 9.77630e+04 [9] PAPI_L2_ICM 8.87580e+04 [2] 8.64540e+04 [6] 8.51670e+04 [8] 7.44190e+04 [5] 7.05190e+04 [10] 5.80070e+04 [7] 4.61460e+04 [12] 4.52980e+04 [11] 4.38410e+04 [1] 3.88810e+04 [4] 3.45550e+04 [9] 2.12300e+04 [3] 1.28000e+04 [13] 7.17700e+03 [15] 4.35200e+03 [14] 5.51000e+02 [0] PAPI_LD_INS 6.35227e+08 [6] 6.32206e+08 [4] 6.16166e+08 [3] 6.13817e+08 [13] 6.06047e+08 [7] 5.97347e+08 [11] 5.84328e+08 [12] 5.82940e+08 [5] 5.79466e+08 [15] 5.77377e+08 [1] 5.75886e+08 [10] 5.70380e+08 [2] 5.67354e+08 [9] 5.56041e+08 [14] 5.26647e+08 [8] 1.79181e+07 [0] PAPI_LST_INS 9.20272e+08 [6] 9.16616e+08 [4] 8.97418e+08 [3] 8.94351e+08 [13] 8.78798e+08 [7] 8.66030e+08 [11] 8.45788e+08 [12] 8.43395e+08 [15] 8.42097e+08 [5] 8.37001e+08 [1] 8.35126e+08 [10] 8.24308e+08 [2] 8.23006e+08 [9] 8.10795e+08 [14] 7.62764e+08 [8] 2.69933e+07 [0] PAPI_RES_STL 1.81292e+08 [8] 1.77814e+08 [9] 1.71292e+08 [14] 1.41056e+08 [12] 1.37402e+08 [4] 1.29071e+08 [2] 1.25902e+08 [1] 1.18725e+08 [7] 1.17092e+08 [15] 1.15538e+08 [10] 1.14667e+08 [13] 9.72356e+07 [3] 9.28595e+07 [5] 7.19624e+07 [11] 5.69897e+07 [6] 1.55340e+07 [0] PAPI_SR_INS 2.85045e+08 [6] 2.84410e+08 [4] 2.81253e+08 [3] 2.80533e+08 [13] 2.72751e+08 [7] 2.68683e+08 [11] 2.63929e+08 [15] 2.61459e+08 [12] 2.59624e+08 [1] 2.59240e+08 [10] 2.59158e+08 [5] 2.55652e+08 [9] 2.54754e+08 [14] 2.53928e+08 [2] 2.36116e+08 [8] 9.07517e+06 [0] PAPI_TLB_DM 1.00804e+05 [15] 1.00352e+05 [13] 7.32320e+04 [5] 6.90070e+04 [2] 6.28760e+04 [3] 6.03760e+04 [14] 5.99600e+04 [9] 5.86920e+04 [4] 5.81450e+04 [8] 5.63260e+04 [6] 5.34060e+04 [7] 5.33880e+04 [10] 4.65510e+04 [1] 4.23710e+04 [12] 3.72730e+04 [11] 2.33280e+04 [0] PAPI_TLB_IM 7.45376e+06 [13] 5.49380e+04 [4] 4.94360e+04 [2] 3.82500e+04 [15] 3.72010e+04 [8] 3.60010e+04 [7] 3.27530e+04 [11] 3.14270e+04 [9] 2.63420e+04 [1] 2.44660e+04 [3] 2.31700e+04 [12] 2.25030e+04 [5] 2.20460e+04 [10] 2.03260e+04 [14] 1.54420e+04 [6] 4.70300e+03 [0] PAPI_TOT_CYC 2.44788e+09 [10] 2.37731e+09 [5] 2.18174e+09 [7] 2.16233e+09 [6] 2.14819e+09 [2] 2.13835e+09 [4] 2.13690e+09 [9] 2.13471e+09 [15] 2.12744e+09 [11] 2.09913e+09 [13] 2.08307e+09 [3] 2.05735e+09 [14] 2.02205e+09 [12] 1.99487e+09 [1] 1.85977e+09 [8] 9.65905e+07 [0] PAPI_TOT_INS 1.59859e+09 [4] 1.59534e+09 [6] 1.54291e+09 [3] 1.54152e+09 [7] 1.53166e+09 [13] 1.51642e+09 [11] 1.46219e+09 [1] 1.45841e+09 [5] 1.45792e+09 [10] 1.45045e+09 [12] 1.44620e+09 [2] 1.43978e+09 [15] 1.43291e+09 [9] 1.38537e+09 [14] 1.32007e+09 [8] 8.53873e+07 [0] Real cycles 3.25905e+10 [8] 3.25873e+10 [6] 3.25834e+10 [10] 3.25823e+10 [12] 3.25817e+10 [5] 3.25687e+10 [2] 3.25684e+10 [15] 3.25672e+10 [7] 3.25622e+10 [0] 3.25620e+10 [13] 3.25612e+10 [3] 3.25598e+10 [11] 3.25593e+10 [4] 3.25593e+10 [14] 3.25571e+10 [9] 3.25559e+10 [1] Real usecs 1.22214e+07 [8] 1.22202e+07 [6] 1.22188e+07 [10] 1.22183e+07 [12] 1.22181e+07 [5] 1.22133e+07 [2] 1.22131e+07 [15] 1.22127e+07 [7] 1.22108e+07 [0] 1.22107e+07 [13] 1.22104e+07 [3] 1.22099e+07 [11] 1.22097e+07 [4] 1.22097e+07 [14] 1.22089e+07 [9] 1.22085e+07 [1] Virtual cycles 1.62038e+10 [10] 1.61964e+10 [2] 1.61908e+10 [4] 1.61879e+10 [9] 1.61860e+10 [7] 1.61854e+10 [13] 1.61847e+10 [5] 1.61845e+10 [15] 1.61839e+10 [8] 1.61826e+10 [11] 1.61822e+10 [12] 1.61807e+10 [3] 1.61806e+10 [1] 1.61799e+10 [6] 1.61775e+10 [14] 1.69785e+08 [0] Virtual usecs 6.09165e+06 [10] 6.08888e+06 [2] 6.08676e+06 [4] 6.08568e+06 [9] 6.08495e+06 [7] 6.08474e+06 [13] 6.08449e+06 [5] 6.08440e+06 [15] 6.08418e+06 [8] 6.08369e+06 [11] 6.08353e+06 [12] 6.08296e+06 [3] 6.08292e+06 [1] 6.08267e+06 [6] 6.08177e+06 [14] 6.38310e+04 [0] Wallclock usecs 6.11718e+06 [8] 6.11669e+06 [10] 6.11660e+06 [6] 6.11534e+06 [12] 6.11482e+06 [5] 6.11426e+06 [2] 6.11415e+06 [15] 6.11403e+06 [7] 6.11351e+06 [13] 6.11240e+06 [3] 6.11238e+06 [0] 6.11218e+06 [11] 6.11203e+06 [14] 6.11201e+06 [4] 6.11195e+06 [1] 6.11141e+06 [9] Derived Metric Descriptions: MFLOPS_wallclock = PAPI_FP_OPS/Wallclock_usecs MFLOPS = PAPI_FP_OPS / Real_usecs IPC = PAPI_TOT_INS / PAPI_TOT_CYC Flops_Per_Load_Store = PAPI_FP_OPS / PAPI_LST_INS Flops_Per_L1_Data_Cache_Miss = PAPI_FP_OPS / PAPI_L1_DCM Load_Store_Ratio = PAPI_LD_INS / PAPI_SR_INS Instructions_Per_Dcache_Miss = PAPI_TOT_INS / PAPI_L1_DCM Time Wallclock_seconds = Wallclock_usecs / 1000000 IO_seconds = IO_cycles / (clockmhz * 1000000) No_Issue_Stall_seconds = PAPI_STL_ICY / (clockmhz * 1000000) Resource_Stall_seconds = PAPI_RES_STL / (clockmhz * 1000000) FP_Stall_seconds = PAPI_FP_OPS / (clockmhz * 1000000) Memory_Stall_seconds = PAPI_MEM_SCY / (clockmhz * 1000000) Max_L1_Miss_L2_Hit_Stall_seconds = (PAPI_LI_DCM - PAPI_L2_DCM) * L2_LATENCY / (clockmhz * 1000000) Max_L2_Miss_L3_Hit_Stall_seconds = (PAPI_L2_DCM - PAPI_L3_DCM) * L3_LATENCY / (clockmhz * 1000000) Max_Memory_Access_Stall_seconds = PAPI_L3_DCM* MEM_LATENCY / (clockmhz * 1000000) Cycles Cycles_In_Domain = PAPI_TOT_CYC Real_Cycles = Real_cycles Running_Time_In_Domain_Percent = 100 * PAPI_TOT_CYC / Real_cycles Virtual_Cycles = Virtual_cycles IO_Cycles_Percent = 100 * IO_cycles / Real_cycles MPI_Cycles_Percent = 100 * MPI_cycles / Real_cycles MPI_Sync_Cycles_Percent = 100 * MPI_Sync_cycles / Real_cycles Thread_Sync_Cycles_Percent = 100 * Thr_Sync_cycles / Real_cycles Instructions Total_Instructions = PAPI_TOT_INS Memory_Instructions_Percent = 100 * PAPI_LST_INS / PAPI_TOT_INS Memory_Instructions_Percent = 100 * (PAPI_LD_INS + PAPI_SR_INS) / PAPI_TOT_INS unless (defined(Memory_Instructions_Percent)) FP_Instructions_Percent = 100 * PAPI_FP_INS / PAPI_TOT_INS FP_Instructions_Percent_approx = 100 * PAPI_FP_OPS / PAPI_TOT_INS unless (defined(FP_Instructions_Percent)) Branch_Instructions_Percent = 100 * PAPI_BR_INS / PAPI_TOT_INS Integer_Instructions_Percent = 100 * PAPI_INT_INS / PAPI_TOT_INS Memory Load_Store_Ratio = PAPI_LD_INS / PAPI_SR_INS L1_Data_Misses_Per_1000_Load_Stores = 1000 * PAPI_L1_DCM / PAPI_LST_INS L1_Data_Misses_Per_1000_Load_Stores = 1000 * PAPI_L1_DCM / PAPI_L1_DCA unless (defined(L1_Data_Misses_Per_1000_Load_Stores)) L1_Instruction_Misses_Per_1000_Instructions = 1000 * PAPI_L1_ICM / PAPI_TOT_INS L2_Data_Misses_Per_1000_L2_Load_Stores = 1000 * PAPI_L2_DCM / PAPI_L2_DCA L2_Instruction_Misses_Per_1000_L2_Instructions = 1000 * PAPI_L2_ICM / PAPI_L2_ICA Data_TLB_Misses_Per_1000_Load_Stores = 1000 * PAPI_TLB_DM / PAPI_LST_INS Instruction_TLB_Misses_Per_1000_Instructions = 1000 * PAPI_TLB_IM / PAPI_TOT_INS L1_Bandwidth_MBytes_per_second = (PAPI_LD_INS + PAPI_SR_INS) * WORD_SIZE / Wallclock_usecs Stalls Resource_Stall_Cycles_Percent = 100 * PAPI_RES_STL / PAPI_TOT_CYC Memory_Stall_Cycles_Percent = 100 * PAPI_MEM_SCY / PAPI_TOT_CYC FP_Stall_Cycles_Percent = 100 * PAPI_FP_STAL / PAPI_TOT_CYC No_Issue_Cycle_Percent = 100 * PAPI_STL_ICY / PAPI_TOT_CYC Full_Issue_Cycle_Percent = 100 * PAPI_FUL_ICY / PAPI_TOT_CYC FPU_Idle_Cycle_Percent = 100 * PAPI_FPU_IDL / PAPI_TOT_CYC LSU_Idle_Cycle_Percent = 100 * PAPI_LSU_IDL / PAPI_TOT_CYC Branch_Misprediction_Percent = 100 * PAPI_BR_MSP / PAPI_BR_INS Arch Parameters L2_LATENCY_IN_CYCLES = L2_LATENCY L3_LATENCY_IN_CYCLES = L3_LATENCY MEMORY_LATENCY_IN_CYCLES = MEM_LATENCY WORD_SIZE_IN_BYTES = WORD_SIZE ------------------------------------------------------------------------------- Default spec file (Id papi 359 2012-01-06 092117Z tushar ) Metric Descriptions Unless mentioned otherwise, counts are accumulated across sub-processes/threads MFLOPS wallclock Millions of floating point ops per *wallclock* second PAPI_FP_OPS/WALL_CLOCK_USEC MFLOPS Millions of FP ops per second PAPI_FP_OPS / Real_usecs IPC Instructions retired per cycle PAPI_TOT_INS / PAPI_TOT_CYC Flops Per Load_Store PAPI_FP_OPS / PAPI_LST_INS Flops Per L1 Data Cache Miss PAPI_FP_OPS / PAPI_L1_DCM Load Store Ratio Ratio of loads to stores. PAPI_LD_INS / PAPI_SR_INS Instructions Per Dcache Miss PAPI_TOT_INS / PAPI_L1_DCM Wallclock seconds Unhalted wallclock time. Never counted twice. WALL_CLOCK_USEC / 1000000 IO seconds Time spent in seconds doing I/O. This includes any time in I/O, including time outside domain, when the process is waiting for I/O to complete. IO cycles / (Clock Hz) No Issue Stall seconds Time in domain with no instruction issue. This would not include cycles outside domain such as system time or I/O time or time process was not scheduled to run. PAPI_STL_ICY / (Clock Hz) Resource Stall seconds Time in seconds stalled on any resource. Resource can be an integer or FP register or reservation station. It would not include stalls waiting for memory operands. PAPI_RES_STL / (Clock Hz) FP Stall seconds Floating-point stall seconds. PAPI_FP_OPS / (Clock Hz) Memory Stall seconds Time in seconds waiting for memory operations. PAPI_MEM_SCY / (Clock Hz) Max L1 Miss L2 Hit Stall sec Maximum time in stalls on L1 miss/L2 hits. This is an estimate calculated using L1 and L2 cache hits and the L2 access latency. The actual latency may have been masked by hit-under-miss or instruction scheduling. (PAPI_LI_DCM-PAPI_L2_DCM)*L2_LATENCY/(Clock Hz) Max L2 Miss L3 Hit Stall sec Maximum time in stalls on L2 miss/L3 hits. This is an estimate calculated using L2 and L3 cache hits and the L3 access latency. The actual latency may have been masked by hit-under-miss or instruction scheduling. (PAPI_L2_DCM-PAPI_L3_DCM)*L3_LATENCY/(Clock Hz) Max Memory Access Stall_sec Maximum time in stalls waiting on memory This is an estimate calculated using L3 cache misses and the L3 access latency. The actual latency may have been masked by hit-under-miss or instruction scheduling. PAPI_L3_DCM * MEM_LATENCY / (Clock Hz) Cycles In Domain Total processor cycles in the PAPI domain PAPI_TOT_CYC. Note, this cycle counter is more granular and accurate than Real or Virtual cycles counter. This may lead to situations where this value is measured as higher than even Real cycles. Real Cycles Always counted, unhalted. Running Time In Domain % Percent of processor time spent in the domain 100 * PAPI_TOT_CYC / Real cycles Virtual Cycles Counted only when executing on the processor. IO Cycles % Percent of cycles spent in I/O 100 * IO cycles / Real cycles MPI Cycles % Percent of cycles spent in MPI 100 * MPI cycles / Real cycles MPI Sync Cycles % Percent of cycles spent in MPI sync ops 100 * MPI Sync cycles / Real cycles Thread Sync Cycles % Percent of cycles spent in thread synchronization 100 * Thr Sync cycles / Real cycles Total Instructions Completed instructions PAPI_TOT_INS Memory Instructions % Percent of instructions that are memory 100 * PAPI_LST_INS / PAPI_TOT_INS or 100 * (PAPI_LD_INS + PAPI_SR_INS) / PAPI_TOT_INS FP Instructions % Percent of instructions that are floating point 100 * PAPI_FP_INS / PAPI_TOT_INS FP Instructions % approx Approximate FP instruction percent, using FP ops instead of instructions 100 * PAPI_FP_OPS / PAPI_TOT_INS Branch Instructions % Percent of instructions that are branches 100 * PAPI_BR_INS / PAPI_TOT_INS Integer Instructions % Percent of instructions that are of integer type 100 * PAPI_INT_INS / PAPI_TOT_INS Load Store Ratio Ratio of loads to stores PAPI_LD_INS / PAPI_SR_INS L1 Data Misses Per 1000 LD/ST L1 data misses per thousand L1 data references 1000 * PAPI_L1_DCM / PAPI_LST_INS or 1000 * PAPI_L1_DCM / PAPI_L1_DCA L1 Instruction Misses Per 1000 L1 I-cache misses per thousand instructions 1000 * PAPI_L1_ICM / PAPI_TOT_INS L2 Data Misses Per 1000 L2 data cache misses per thousand L2 data references 1000 * PAPI_L2_DCM / PAPI_L2_DCA L2 Instruction Misses Per 1000 L2 instruction cache misses per thousand L2 I-cache references 1000 * PAPI_L2_ICM / PAPI_L2_ICA Data TLB Misses Per 1000 LD/ST D-TLB misses per thousand load stores 1000 * PAPI_TLB_DM / PAPI_LST_INS Ins. TLB Misses Per 1000 ins. I-TLB misses per thousand instructions 1000 * PAPI_TLB_IM / PAPI_TOT_INS L1 Bandwidth MBytes per second Effective cumulative L1 bandwidth achieved (PAPI_LD_INS + PAPI_SR_INS) * WORD_SIZE / Wallclock usecs Resource Stall Cycles % Percent of total cycles stalled for any resource 100 * PAPI_RES_STL / PAPI_TOT_CYC Memory Stall Cycles % Percent of total cycles stalled for memory ops 100 * PAPI_MEM_SCY / PAPI_TOT_CYC FP Stall Cycles % Percent of total cycles stalled for FP ops 100 * PAPI_FP_STAL / PAPI_TOT_CYC No Issue Cycle % Percent of cycles with no issue 100 * PAPI_STL_ICY / PAPI_TOT_CYC Full Issue Cycle % Percent of cycles with full issue 100 * PAPI_FUL_ICY / PAPI_TOT_CYC FPU Idle Cycle % Percent of cycles where the FP unit was idle 100 * PAPI_FPU_IDL / PAPI_TOT_CYC LSU Idle Cycle % Percent of cycles the load-store unit was idle 100 * PAPI_LSU_IDL / PAPI_TOT_CYC Branch Misprediction % Percent of mispredicted branches 100 * PAPI_BR_MSP / PAPI_BR_INS -------------------------------------------------------------------------------