Quantcast
Channel: Intel® Software - Intel® Many Integrated Core Architecture (Intel MIC Architecture)
Viewing all articles
Browse latest Browse all 1789

Porting __m128i instructions to Phi

$
0
0

I have a very complex project which has being heavily optimized to take advantage of SSSE2 and SSSE3 instructions. For example in my project I have the following defines:

    #define ADDS8(a, b) (_mm_adds_epi8((a),(b)))
    #define SUB8(a, b) (_mm_sub_epi8((a),(b)))
    #define SUBS8(a, b) (_mm_subs_epi8((a), (b)))
    #define ABS8(a) (_mm_abs_epi8((a)))
    #define ADD64(a, b) (_mm_add_epi64((a), (b)))

I'm trying to evaluate if this program might benefit from running on Xeon Phi. My initial approach for porting was to implement __m128i intrinsics in C++ and hope that Intel compiler will vectorize the resulting code. So I've done something along those lines:

#ifndef __MIC__
    #include <emmintrin.h>
    #include <tmmintrin.h>

    typedef __m128i M128i;

    #define SUB8(a, b) (_mm_sub_epi8((a),(b)))
    #define ABS8(a) (_mm_abs_epi8((a)))
    #define ADD64(a, b) (_mm_add_epi64((a), (b)))
#else
    #include <stdint.h>

    typedef union
    {
        int64_t i64[2];
        int32_t i32[4];
        int16_t i16[8];
        int8_t  i8[16];
    } M128i;

    inline M128i SUB8(const M128i& a, const M128i& b)
    {
        M128i res;
        for (int i=0; i<16; i++) {
            res.i8[i] = a.i8[i] - b.i8[i];
        }
        return res;
    }

    inline M128i ABS8(const M128i& a)
    {
        M128i res;
        for (int i=0; i<16; i++) {
            res.i8[i] = abs(a.i8[i]);
        }
        return res;
    }

    inline M128i ADD64(const M128i& a, const M128i& b)
    {
        M128i res;
        res.i64[0] = a.i64[0] + b.i64[0];
        res.i64[1] = a.i64[1] + b.i64[1];
        return res;
    }
#endif

My ported code runs on Phi. I've verified that Intel compiler vectorized SUB8 and ABS8. But my code still runs 10x times slower on Phi than on Xeon processor. Is there any other way I can port __m128i instructions to Phi without rewriting the whole project? Perhaps using __m512i instructions? I would appreciate any advice.


Viewing all articles
Browse latest Browse all 1789

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>