Quantcast
Channel: Intel® Software - Intel® Many Integrated Core Architecture (Intel MIC Architecture)
Viewing all articles
Browse latest Browse all 1789

Difference Performance Datatype dependency

$
0
0

Hi all and thanks to help,

I write in Fortran a stupid program that implements a dot product between two arrays , one in double precision and the other changing the datatype.

PROGRAM datatype

USE omp_lib

implicit none

double precision, allocatable,dimension(:,:,:) :: A,B,C
integer(kind=1), allocatable,dimension(:,:,:) :: D
integer(kind=4), allocatable,dimension(:,:,:) :: E
integer(kind=8), allocatable,dimension(:,:,:) :: F
real, allocatable,dimension(:,:,:) :: G
LOGICAL, allocatable,dimension(:,:,:) :: H

integer :: t,i,j,k,size = 500,repetition=40
double precision :: time,time1

ALLOCATE(A(size,size,size),B(size,size,size),C(size,size,size))

A = 4.
B = 1.

time = omp_get_wtime()
do t = 1,repetition
	do i=1,size
		do j=1,size
			do k=1,size
			!dir$ vector aligned
			c(k,j,i) = a(k,j,i) * b(k,j,i) +5.2
			enddo
		enddo
	enddo
enddo
time = omp_get_wtime() - time

print *,"TIME double",time/DBLE(repetition)

DEALLOCATE(B)

ALLOCATE(G(size,size,size))
G = 240.

time = omp_get_wtime()
do t = 1,repetition
	do i=1,size
		do j=1,size
			do k=1,size
			!dir$ vector aligned
			c(k,j,i) = a(k,j,i) * g(k,j,i) +5.2
			enddo
		enddo
	enddo
enddo
time = omp_get_wtime() - time

print *,"TIME float",time/DBLE(repetition)

DEALLOCATE(G)

ALLOCATE(D(size,size,size))
D = 240

time = omp_get_wtime()
do t = 1,repetition
	do i=1,size
		do j=1,size
			do k=1,size
			!dir$ vector aligned
			c(k,j,i) = a(k,j,i) * d(k,j,i) +5.2
			enddo
		enddo
	enddo
enddo
time = omp_get_wtime() - time

print *,"TIME int8",time/DBLE(repetition)

DEALLOCATE(D)

ALLOCATE(E(size,size,size))
e = 240

time = omp_get_wtime()
do t = 1,repetition
	do i=1,size
		do j=1,size
			do k=1,size
			!dir$ vector aligned
			c(k,j,i) = a(k,j,i) * e(k,j,i) +5.2
			enddo
		enddo
	enddo
enddo
time = omp_get_wtime() - time

print *,"TIME int32",time/DBLE(repetition)

DEALLOCATE(E)

ALLOCATE(F(size,size,size))
f = 240

time = omp_get_wtime()
do t = 1,repetition
	do i=1,size
		do j=1,size
			do k=1,size
			!dir$ vector aligned
			c(k,j,i) = a(k,j,i) * f(k,j,i) +5.2
			enddo
		enddo
	enddo
enddo
time = omp_get_wtime() - time

print *,"TIME int64",time/DBLE(repetition)

DEALLOCATE(F)

ALLOCATE(H(size,size,size))
h = .True.

time = omp_get_wtime()
do t = 1,repetition
	do i=1,size
		do j=1,size
			do k=1,size
			!dir$ vector aligned
			c(k,j,i) = a(k,j,i) * h(k,j,i) +5.2
			enddo
		enddo
	enddo
enddo
time = omp_get_wtime() - time

print *,"TIME logical",time/DBLE(repetition)

END PROGRAM

I try this code on Broadwell Intel(R) Xeon(R) E5-2697 v4 @ 2.30GHz and Intel Xeon Phi 7250 KNL. 

BROADWELL (1 core)

 TIME double  0.314651775360107
 TIME float  0.256021851301193
 TIME int8  0.218752950429916
 TIME int32  0.245272749662399
 TIME int64  0.319928669929504
 TIME logical  0.245576351881027

-------------------------------------------------

KNL (1 core)

 TIME double  0.545190346240997
 TIME float  0.608061379194260
 TIME int8  0.749213725328445
 TIME int32  0.718595725297928
 TIME int64  0.730906349420547
 TIME logical  0.544638276100159 

On the broadwell architecture the best performance was obtained with  double * int 8 and the worst was double * double . I think the better performance on int8 is due to better use of cache that mask the time of cast from int8 to double, is it right?

I don't understand because on KNL the behavour is opposite. I analyzed compiler opt report but in both case the double precision decide the vector lengh so the operation per clock cycle.

Someone can help me to understand this behaviour?

Thanks

Best regards

Eric

Zone: 

Thread Topic: 

Question

Viewing all articles
Browse latest Browse all 1789

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>