Quantcast
Channel: Intel® Software - Intel® Many Integrated Core Architecture (Intel MIC Architecture)
Viewing all articles
Browse latest Browse all 1789

KNL - Compiler clearing dest. register for gather/scatter

$
0
0

Hello,

I discovered that the icpc (17.01) for xmic-avx512 (KNL) clears the destination register before the corresponding gather instruction by introducing
vpxord %zmm2, %zmm2, %zmm2
although I am using a non-masked gather instruction which implies that the entire register will be written.

In Jeffers' and Sodani's Book about KNL programming, the authors also show a similiar line in fig6.32 but unfortunately they just give an unsatisfying comment on it: "Clearing the contents of the zmm1 registers for Gather/Scatter Operation".

Can someone please explain to me the reasoning behind the introduction of this line?

Regards, 
Michael

Appendix:
Full Loop in ASM:

..B1.7: # Preds ..B1.7 ..B1.6
    # Execution count [5.00e+00]

    vpxord      %zmm2, %zmm2, %zmm2                           #18.12 c1
    kxnorw      %k0, %k0, %k1                                 #18.12 c1
    addl        $1, %eax                                      #17.2 c1
    kxnorw      %k0, %k0, %k2                                 #20.3 c3
    vgatherdpd   (%r12,%ymm0,8), %zmm2{%k1}                   #18.12 c3
    vaddpd      %zmm2, %zmm1, %zmm3                           #19.12 c9 stall 2
    vscatterdpd   %zmm3, (%r12,%ymm0,8){%k2}                  #20.3 c15 stall 2
    cmpl        $250, %eax                                    #17.2 c15
    jb          ..B1.7        # Prob 82%                      #17.2 c17

Actual C++ Code:
https://pastebin.com/wGpcFiGm

Compiled with:
icpc -std=c++11 -O3 -xmic-avx512 gather.cpp -o gather.out
icpc -std=c++11 -O3 -xmic-avx512 gather.cpp -o gather.asm -S -fverbose-asm
 

 

 


Viewing all articles
Browse latest Browse all 1789

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>