Vtune bandwidth calculation does not match STREAM benchmark output on KNC

Dear Intel Forum Gurus,

I am practicing Vtune by using the Vtune 2016 GUI to measure the bandwidth of the STREAM benchmark on KNC 5110P. I compiled stream.c with the following options:

icpc -mmic -O3 -g -qopenmp -DSTREAM_ARRAY_SIZE=64000000 -qopt-prefetch-distance=64,8 -qopt-streaming-cache-evict=0 -qopt-streaming-stores never -restrict stream.c

Streaming stores are omitted because I want to try core-event-based sampling (more on that later).

First I tried to see if I could get the GUI to give me the bandwidth directly. This (https://software.intel.com/en-us/forums/intel-vtune-amplifier-xe/topic/518185#comment-1793935) indicates that Vtune 2013 can give the bandwidth directly, but I didn't see "Bandwidth" among my available analysis types (see first screenshot below). This (https://software.intel.com/en-us/articles/tutorial-on-intel-xeon-phi-processor-optimization) Section 6.2 indicates that Vtune 2017 will give a nice bandwidth histogram, but I didn't see the Memory Usage viewpoint within the Memory Access analysis type (second screenshot below).

Next I tried to measure the bandwidth using the formula given section 5.4 of this https://software.intel.com/en-us/articles/optimization-and-performance-tuning-for-intel-xeon-phi-coprocessors-part-2-understanding, specifically:

Read bandwidth = (L2_DATA_READ_MISS_MEM_FILL + L2_DATA_WRITE_MISS_MEM_FILL + HWP_L2MISS) * 64 / CPU_CLK_UNHALTED

Write bandwidth = (L2_VICTIM_REQ_WITH_DATA + SNP_HITM_L2) * 64 / CPU_CLK_UNHALTED

Bandwidth = (Read bandwidth + write bandwidth)

I compiled without streaming stores because these events do not account for streaming stores. I created a custom analysis type to record all the necessary events, and applied the formula (third screenshot below) to the Triad kernel (highlighted line). I am dividing CPU_CLK_UNHALTED by 60 in the denominator because I'm almost positive CPU_CLK_UNHALTED measures the sum of clock ticks on all 60 cores, so to get the actual wall time of the function, I need to divide by 60.

My calculation with the metrics gave 182.75 GB/s, but the actual STREAM executable's output was "Triad: 101985.9 MB/s." This is in the same ballpark but still a pretty big difference, and makes me suspicious of my calculation.

My questions are 1. Is there a way that I overlooked to get the GUI to tell me the bandwidth directly (perhaps computed under the hood using memory controller events instead of core events)? 2. Am I applying the formula using the core events correctly? If so, why is there such a large discrepancy with the output of the STREAM executable?

Thanks in advance for your help,
Michael

Attachment	Size
Download 1.png	1.91 MB
Download 2.png	1.14 MB
Download 3.png	408.33 KB

Thread Topic:

Question

Vtune bandwidth calculation does not match STREAM benchmark output on KNC

Thread Topic:

Trending Articles

ZARIA CUMMINGS

Black Angus Grilled Artichokes

Tone2 - FilterBank 3 New V3.4 VST/AU MAC/WIN

Plymouth man accused of sexually assaulting girl

FINAL LESSON

3 Extremely pleasurable sex positions for slim women

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

[GameMode] United Gaming Roleplay - BASE SCRIPT

[GET] JokerSZN – 4 Step Trading Protocol ($229.00)

transaction POxxxx exceeds the budget funds available for dimension value...

Adolescence Paragraph

Deeds, July, 28, 2017

£1m Stoke-on-Trent cocaine gang jailed for 51 years

Scaffolding boss fined £11,000 for safety law offences

30 Grove Road, Glasnevin, Dublin 11 - €349,950

Mp3 Download: Gospel Goes Classical - Siphila Kamnandi (feat. Nduduzo Matse...

ROBERT WSZOLEK Arrested by Cook County Sheriff's Office on Oct 27, 2016

Nalgonda District Police Office Mobile Numbers List in Telangana State

VHSE First (1st) Allotment 2025 - vhscap.kerala.gov.in

Braunstone man Mehran Falsafi made threat to stab his neighbour after noisy...