[Info-vax] OpenMP Performance Problem
alex-lurk
alex.lurk at googlemail.com
Sun Aug 30 08:52:11 EDT 2009
Dear all,
I'm using OpenMP and have big performance problems.
I have a simple program written in Fortran with 4 parallel sectors.
You find the important source code below.
I'm using Microsoft VisualStudio 2005 Professional Edition.
The operating system is Microsoft Windows Server 2003, Standard x64
Edition, Service Pack 2.
It's running on an AMD Oteron with 4 CPUs.
In Microsoft VisualStudio 2005 I have set under "Configuration-
>Fortran->Linker" the following settings:
SubSystem: Console
Heap Reserve Size: 256000000
Heap Commit Size: 128000000
Stack Reserve Size: 256000000
Stack Commit Size: 128000000
Enable Large Adresses: Support Adresses Larger Than 2 GB
Terminal Server: Default
If I parallelize the 4 sectors like you can see below the calculation
for all 4 sectors takes about 80 seconds.
If I run it serial (without OpenMP) it is much quicker, in this case
it only needs about 50 seconds.
I don't understand this difference and I thought/hoped the parallel
OpenMP version is much quicker, but it isn't :-(.
Do you know why?
Have I made something wrong?
Or should I use another OpenMP Directive instead of "SECTIONS"?
Or do you have experinces with another technologies instead of OpenMP
for parallelizing Fortran programs?
Thanks a lot for your help!
Many greetings,
Alex
Soure code start: _____________________
...
...
INTEGER N, I, J
PARAMETER (N=1000000)
REAL A1(N), A2(N), A3(N)
REAL B1(N), B2(N), B3(N)
REAL C1(N), C2(N), C3(N)
REAL D1(N), D2(N), D3(N)
REAL parallel_time_begin, parallel_time_end
REAL section1_time_begin, section1_time_end
REAL section2_time_begin, section2_time_end
REAL section3_time_begin, section3_time_end
REAL section4_time_begin, section4_time_end
...
...
! Some initializations
DO I = 1, N
A1(I) = I + 1.5
A2(I) = I + 22.35
B1(I) = I + 1.5
B2(I) = I + 22.35
C1(I) = I + 1.5
C2(I) = I + 22.35
D1(I) = I + 1.5
D2(I) = I + 22.35
ENDDO
...
...
PRINT *, '***** parallel dcal start *****'
CALL CPU_TIME ( parallel_time_begin )
C$OMP PARALLEL PRIVATE(A1, A2, A3,
1B1, B2, B3,
2C1, C2, C3,
3D1, D2, D3,
4I, J)
C$OMP SECTIONS
C$OMP SECTION
PRINT *, '***** 1. Section Start'
CALL CPU_TIME ( section1_time_begin )
DO J = 1, 1000
DO I = 1, N
A3(I) = A1(I) + A2(I)
ENDDO
ENDDO
CALL CPU_TIME ( section1_time_end )
PRINT *, '====> time of section1 was ',
1section1_time_end - section1_time_begin, ' seconds <===='
PRINT *, '***** 1. Section End'
C$OMP SECTION
PRINT *, '***** 2. Section Start'
CALL CPU_TIME ( section2_time_begin )
DO J = 1, 2000
DO I = 1, N
B3(I) = B1(I) + B2(I)
ENDDO
ENDDO
CALL CPU_TIME ( section2_time_end )
PRINT *, '====> time of section2 was ',
1section2_time_end - section2_time_begin, ' seconds <===='
PRINT *, '***** 2. Section End'
C$OMP SECTION
PRINT *, '***** 3. Section Start'
CALL CPU_TIME ( section3_time_begin )
DO J = 1, 3000
DO I = 1, N
C3(I) = C1(I) + C2(I)
ENDDO
ENDDO
CALL CPU_TIME ( section3_time_end )
PRINT *, '====> time of section3 was ',
1section3_time_end - section3_time_begin, ' seconds <===='
PRINT *, '***** 3. Section End'
C$OMP SECTION
PRINT *, '***** 4. Section Start'
CALL CPU_TIME ( section4_time_begin )
DO J = 1, 4000
DO I = 1, N
D3(I) = D1(I) + D2(I)
ENDDO
ENDDO
CALL CPU_TIME ( section4_time_end )
PRINT *, '====> time of section4 was ',
1section4_time_end - section4_time_begin, ' seconds <===='
PRINT *, '***** 4. Section End'
C$OMP END SECTIONS NOWAIT
C$OMP END PARALLEL
...
...
Source code end: _____________________
More information about the Info-vax
mailing list