Hi,
I am currently doing some performance tests on some offload code for Xeon Phi. I have been calculating performance numbers by measuring hardware counters using PAPI, with the calculation methods explained here:
https://software.intel.com/en-us/articles/optimization-and-performance-t...
However, in the memory bandwidth section (5.4), the guide says to use an event named HWP_L2MISS to count the number of hardware prefetches that missed L2, which is provided in VTune apparently - although it does not appear to be an actual event according to the list of available events for the PMU document here:
https://software.intel.com/sites/default/files/forum/278102/intelr-xeon-...
I assume it is some derived metric VTune works out for you - however I was wondering if anyone knows how it should be calculated? Could I add the number of prefetch0 and prefetch1 requests missed by L2 as provided by counters L2_DATA_PF1_MISS & L2_DATA_PF2_MISS or is there more to it?
Thanks,
Tim