Hello,
I discovered that the icpc (17.01) for xmic-avx512 (KNL) clears the destination register before the corresponding gather instruction by introducing
vpxord %zmm2, %zmm2, %zmm2
although I am using a non-masked gather instruction which implies that the entire register will be written.
In Jeffers' and Sodani's Book about KNL programming, the authors also show a similiar line in fig6.32 but unfortunately they just give an unsatisfying comment on it: "Clearing the contents of the zmm1 registers for Gather/Scatter Operation".
Can someone please explain to me the reasoning behind the introduction of this line?
Regards,
Michael
Appendix:
Full Loop in ASM:
..B1.7: # Preds ..B1.7 ..B1.6
# Execution count [5.00e+00]
vpxord %zmm2, %zmm2, %zmm2 #18.12 c1 kxnorw %k0, %k0, %k1 #18.12 c1 addl $1, %eax #17.2 c1 kxnorw %k0, %k0, %k2 #20.3 c3 vgatherdpd (%r12,%ymm0,8), %zmm2{%k1} #18.12 c3 vaddpd %zmm2, %zmm1, %zmm3 #19.12 c9 stall 2 vscatterdpd %zmm3, (%r12,%ymm0,8){%k2} #20.3 c15 stall 2 cmpl $250, %eax #17.2 c15 jb ..B1.7 # Prob 82% #17.2 c17
Actual C++ Code:
https://pastebin.com/wGpcFiGm
Compiled with:
icpc -std=c++11 -O3 -xmic-avx512 gather.cpp -o gather.out
icpc -std=c++11 -O3 -xmic-avx512 gather.cpp -o gather.asm -S -fverbose-asm