Hello Intel,
I wrote "a kind of" meta compiler to generate SIMD code on multi platform (x86, power, PHI, etc ...). I will present my work in the ISC 2014 in june. I am preparing the Super Computing conference, where I would like present to result on the Phi platform I have an issue.
I am trying to calculate the GFLOP/s of my application, a first approach will be to count the number of operations and divide by the elapsed time, as usually done for dgemm benchmark. Unfortunately I have thousands of lines ...
I read on an intel post: http://software.intel.com/en-us/articles/best-know-method-estimating-flops-for-workloads-running-on-the-intel-xeon-phi-coprocessor
I may get GFLOP if I divide VPU_ELEMENTS_ACTIVE counter by the time of execution. It is rough estimation but enough for a first approach.
I check this on my code for float I get, 45560136680 for 0.013 [s] and for double 91938275814 for 0.0281985 [s].
Well I will get the same number of GFLOP/s for float and double although I should have twice more FLOPS for float.
An other post of intel : https://software.intel.com/en-us/articles/optimization-and-performance-tuning-for-intel-xeon-phi-coprocessors-part-2-understanding says "We would like to be able to measure efficiency in terms of floating-point operations per second, as that can easily be compared to the peak floating-point performance of the machine. However, the Intel Xeon Phi coprocessor does not have events to count floating-point operations."
So is it possible to have this GFLOP or not ?
Best,
++t