*Serial processing pprogramming [#d576a6ab]
#contents

Here is the summarized text of [[porting your programs from former system to current system>Porting your programs]].

**Compiling Command [#cbaa47fc]
-ifort
--for Fortran programs
-icc
--for C programs
-icpc
--for C++ programs

We recommend you to use Intel Compiler which could get good performance of Xeon processors.
It is also possible using GNU compiler.

**Compiler Options [#cb56fd9a]
-Optimization Options
--Recomended optimization options
~
The follows are recomended optimization options for bugfree programs.
---eic or eich
 -O3 -xAVX

---eicp
 -O3 -xCORE-AVX2

--Optimization Options
|~Option|~description|
|-O0|Disables all optimizations|
|-O1|Enables optimizations for speed and disables some optimizations that increase code size and affect speed.|
|-O2|Enables optimizations for speed. This is the generally recommended optimization level.(default)|
|-O3|Performs O2 optimizations and enables more aggressive loop transformations.|

--Code Generation Options
|~Option|~Descripton|
|-xAVX|May generate AVX instructions|
|-xCORE-AVX2|May generate AVX2 instructions|
~
#ref(AVX_e.jpg,left,nowrap,80%)
~
AVX2 instruction set includes FMA(Fused Multiply-Add).
~
A=A*B+C
~
FMA calculate the above expression in an instruction.

---eic and eich support -xAVX option
---eicp supports -xAVX and -xCORE-AVX2 options.

--Floating point Operation Options
|~Options|~Description|
|-no-prec-div|[Improves performance]Enables optimization of floating-point divides|
|-fp-model fast [=1/2]|[Improves performance]Enables more aggressive optimizations on floating-point data|
|-fp-model precise|[Improves precision]Disables optimizations that are not value-safe on floating-point data|
Deault : -fp-model fast=1

--Debug Options
---ifort Only
|~Options|~Description|
|-traceback -g|When the severe error occurs, source file, routine name, and line number correlation information is displayed along with call stack hexadecimal addresses (program counter trace).|
|-traceback -g -check bounds|Determines whether checking occurs for array subscript and character substring expressions.|
|-traceback -g -fpe0|If floating-point invalid divide-by-zero, and overflow exceptions occur, execution is aborted.|
(*)Specifying -g turn off -O2 and make -O0 the deault unless -O2 is explicity speciied int the same command line.
&br;Debug Options may affect the speed of your programs. So, when debugging is done, you would be better off removing these debug options.

**Specific Memory Model [#zd4b6f2b]
The compiler restricts code and data to the first 2GB of address space. 
If, during linking, you fail to use the appropriate memory model and dynamic library options, an error message in this format occurs: 

 relocation  truncated  to  fit: R_X86_64_32S against  `.bss'
 relocation  truncated  to  fit: R_X86_64_32S  against  `.bss'
~
When you specify option -mcmodel=medium or -mcmodel=large, it sets option -shared-intel. 
|~Options|~Description|
|-mcmodel=small(default)|Tells the compiler to restrict code and data to the first 2GB of address space. |
|-mcmodel=medium|Tells the compiler to restrict code to the first 2GB; it places no memory restriction on data. |
|-mcmodel=large|Places no memory restriction on code or data. |
|~Options|~Description|
|-shared-intel|This option causes Intel-provided libraries to be linked in dynamically.|

**Math Kernel Library (MKL) [#t799fdd4]
-MKL provides math processing routines as follows.

--BLAS
--BLACS
--LAPACK
--ScaLAPACK
--PBLAS
--sparse solvers
--Vector Math (VML)
--Vector Statistics (VSL)
--Fast Fourier Transform
--FFTW interface for Fast Fouriew Transform
~
etc..

-How to link serial version or multi-threaded version
--Serial version
 $ ifort -o a.out -mkl=sequential
--Multi-threaded version
 $ ifort -o a.out -mkl=parallel

-ex.1)Vector inner product calculation using SDOT routine
 $ cat test1.f
     program test1
     real x(10), y(10), sdot, res
     integer n, incx, inxy, i
     external sdot
     n = 5
     incx = 2
     ncy = 1
     do i = 1, 10
        x(i) = real(i)
        y(i) = 1.0e0
     enddo
     res = sdot(n, x, incx, y, incy)
     print*,'SDOT = ', res
     stop
     end
 
 $ ifort  -O3 -xAVX test1.f  -mkl=sequential
 $ dplace ./a.out
 
 SDOT =    25.00000

-ex.2)FFTW using FFT in MKL
 $ cat test2.f
 
 .... FFTW source code ......
 
 $ ifort -O3 -xAVX test2.f -I${MKLROOT}/include/fftw  -mkl=sequential
 $ dplace ./a.out


**Time Functions [#sc735795]

-Fortran
--dclock
~
Return elapsed time from 0:00 in the day. Return value has real(8) data type.
~
 real(8) time1, dclock
 time1 = dclock()
ex.)
 $ cat test3.f
 
     program test3
     real*8 dclock, t1, t2
     t1 = dclock()
     call sub()
     t2 = dclock()
      write(6,*) "time :", t2 - t1
     end
 
     subroutine sub()
     call system("sleep 3")
     return
     end
 
 $ ifort -O3  -xAVX test3.f
 $ dplace ./a.out
  time :   3.01978499999677

**Performance Analyzing Tool [#rc555f9b]
-SGI Perfsuite
~
For detection which functions, routines, lines consume run-rime.
Using psrun command with your program.
When your job is done, the result of the psrun command are available on current directory.
Use psprocess to format that result files
~
ex)
 $ dplace psrun ./a.out
 
 $ ls *.xml
 a.out.78075.eicp1.xml
 
The result of the psrun command is :
~
"Prog-name.pid.hostid.xml"
~
 
 $ psprocess a.out.78075.eicp1.xml
 
 Samples   Self   Total %  Function
    3770   69.43%   69.43%  FUNC1__
     420    7.73%   77.16%  FUNC3_
     412    7.59%   84.75%  FUNC-tmp4_
     262    4.83%   89.58%  SUB-diff_
     133    2.45%   92.03%  SUB_init_
      11    2.03%   94.05%  SUB_out_
Samples  : The number of sampling counts~
Self %   : Percentage of total~
Total %  : Accumulated counts~
Function : The function name
~
~
Compilation with -g, psrun provide source-line profiling.
 Samples    Self%   Total%   Function:  File:Line
     601  10.20%  10.20%   FUNC1:/home/t2.f:556
     466   7.91%  18.12%   FUNC1:/home/t2.f:389
     258   4.38%  22.50%   FUNC1:/home/t2.f:383
     252   4.28%  26.77%   FUNC1:/home/t2.f:451
     233   3.96%  30.73%   FUNC1:/home/t2.f:178


トップ   新規 一覧 単語検索 最終更新   ヘルプ