*Serial processing pprogramming [#d576a6ab]

Here is the summarized text of [[porting your programs from former system to current system>Porting your programs]].

**Compiling Command [#cbaa47fc]
--for Fortran programs
--for C programs
--for C++ programs

We recommend you to use Intel Compiler which could get good performance of Xeon processors.
It is also possible using GNU compiler.

**Compiler Options [#cb56fd9a]
-Optimization Options
--Recomended optimization options
The follows are recomended optimization options for bugfree programs.
---eic or eich
 -O3 -xAVX

 -O3 -xCORE-AVX2

--Optimization Options
|-O0|Disables all optimizations|
|-O1|Enables optimizations for speed and disables some optimizations that increase code size and affect speed.|
|-O2|Enables optimizations for speed. This is the generally recommended optimization level.(default)|
|-O3|Performs O2 optimizations and enables more aggressive loop transformations.|

--Code Generation Options
|-xAVX|May generate AVX instructions|
|-xCORE-AVX2|May generate AVX2 instructions|
The AVX instructions support 256-bit vectors. Programs can pack four double precision floating numbers in the vectors.
--- eight single precision floatingp-point arithmetic (8x32 bit = 256 bit)
--- four  double precision floatingp-point arithmetic (4x64 bit = 256 bit)
AVX2 instruction set includes FMA(Fused Multiply-Add).
FMA calculate the above expression in an instruction.

---eic and eich support -xAVX option
---eicp supports -xAVX and -xCORE-AVX2 options.

--Floating point Operation Options
|-no-prec-div|[Improves performance]Enables optimization of floating-point divides|
|-fp-model fast [=1/2]|[Improves performance]Enables more aggressive optimizations on floating-point data|
|-fp-model precise|[Improves precision]Disables optimizations that are not value-safe on floating-point data|
Deault : -fp-model fast=1

--Debug Options
---ifort Only
|-traceback -g|When the severe error occurs, source file, routine name, and line number correlation information is displayed along with call stack hexadecimal addresses (program counter trace).|
|-traceback -g -check bounds|Determines whether checking occurs for array subscript and character substring expressions.|
|-traceback -g -fpe0|If floating-point invalid divide-by-zero, and overflow exceptions occur, execution is aborted.|
(*)Specifying -g turn off -O2 and make -O0 the deault unless -O2 is explicity speciied int the same command line.
&br;Debug Options may affect the speed of your programs. So, when debugging is done, you would be better off removing these debug options.

**Specific Memory Model [#zd4b6f2b]
The compiler restricts code and data to the first 2GB of address space. 
If, during linking, you fail to use the appropriate memory model and dynamic library options, an error message in this format occurs: 

 relocation  truncated  to  fit: R_X86_64_32S against  `.bss'
 relocation  truncated  to  fit: R_X86_64_32S  against  `.bss'
When you specify option -mcmodel=medium or -mcmodel=large, it sets option -shared-intel. 
|-mcmodel=small(default)|Tells the compiler to restrict code and data to the first 2GB of address space. |
|-mcmodel=medium|Tells the compiler to restrict code to the first 2GB; it places no memory restriction on data. |
|-mcmodel=large|Places no memory restriction on code or data. |
|-shared-intel|This option causes Intel-provided libraries to be linked in dynamically.|

**Math Kernel Library (MKL) [#t799fdd4]
-MKL provides math processing routines as follows.

--sparse solvers
--Vector Math (VML)
--Vector Statistics (VSL)
--Fast Fourier Transform
--FFTW interface for Fast Fouriew Transform

-How to link serial version or multi-threaded version
--Serial version
 $ ifort -o a.out -mkl=sequential
--Multi-threaded version
 $ ifort -o a.out -mkl=parallel

-ex.1)Vector inner product calculation using SDOT routine
 $ cat test1.f
     program test1
     real x(10), y(10), sdot, res
     integer n, incx, inxy, i
     external sdot
     n = 5
     incx = 2
     ncy = 1
     do i = 1, 10
        x(i) = real(i)
        y(i) = 1.0e0
     res = sdot(n, x, incx, y, incy)
     print*,'SDOT = ', res
 $ ifort  -O3 -xAVX test1.f  -mkl=sequential
 $ dplace ./a.out
 SDOT =    25.00000

-ex.2)FFTW using FFT in MKL
 $ cat test2.f
 .... FFTW source code ......
 $ ifort -O3 -xAVX test2.f -I${MKLROOT}/include/fftw  -mkl=sequential
 $ dplace ./a.out

**Time Functions [#sc735795]

Return elapsed time from 0:00 in the day. Return value has real(8) data type.
 real(8) time1, dclock
 time1 = dclock()
 $ cat test3.f
     program test3
     real*8 dclock, t1, t2
     t1 = dclock()
     call sub()
     t2 = dclock()
      write(6,*) "time :", t2 - t1
     subroutine sub()
     call system("sleep 3")
 $ ifort -O3  -xAVX test3.f
 $ dplace ./a.out
  time :   3.01978499999677
(*)To mimute elapsed time of your Fortran programs which running until the following day, 
we would introduce wrapper routine of gettimeofday.
dclock.c convert the return of gettimeofday in micro second bit to second bit.
 $ cat dclock.c
  double  dclock_()
      struct timeval tp;
      struct timezone tzp;
Bellow is a compiling example as test3.f linking with dclock.c.
 $ icc  -c  dclock.c
 $ ifort  -O3 -xAVX  -o  a.out  test3.f  dclock.o
 $ dplace  ./a.out
  time :   3.01569199562073
In the above example, gettimeofday minute elapsed time of test3.f.

Returns seconds and microseconds since 00:00 Jan 1, 1970.
Return value has INTEGER data type.
If an error occurs, the value is -1, otherwise 0.
ex)The function as follows, returns elapsed time in second
 $ cat elapsed.c
      double elapsed()
      struct timeval tp;
      struct timezone tzp;
 $ cat test5.c
      int main(void)
      int i;
      float s=0; 
      double ts, te;
      double elapsed();

**Performance Analyzing Tool [#rc555f9b]
-SGI Perfsuite
For detection which functions, routines, lines consume run-rime.
Using psrun command with your program.
When your job is done, the result of the psrun command are available on current directory.
Use psprocess to format that result files
 $ dplace psrun ./a.out
 $ ls *.xml
The result of the psrun command is :
 $ psprocess a.out.78075.eicp1.xml
 Samples   Self   Total %  Function
    3770   69.43%   69.43%  FUNC1__
     420    7.73%   77.16%  FUNC3_
     412    7.59%   84.75%  FUNC-tmp4_
     262    4.83%   89.58%  SUB-diff_
     133    2.45%   92.03%  SUB_init_
      11    2.03%   94.05%  SUB_out_
Samples  : The number of sampling counts~
Self %   : Percentage of total~
Total %  : Accumulated counts~
Function : The function name
Compilation with -g, psrun provide source-line profiling.
 Samples    Self%   Total%   Function:  File:Line
     601  10.20%  10.20%   FUNC1:/home/t2.f:556
     466   7.91%  18.12%   FUNC1:/home/t2.f:389
     258   4.38%  22.50%   FUNC1:/home/t2.f:383
     252   4.28%  26.77%   FUNC1:/home/t2.f:451
     233   3.96%  30.73%   FUNC1:/home/t2.f:178

トップ   編集 差分 バックアップ 添付 複製 名前変更 リロード   新規 一覧 単語検索 最終更新   ヘルプ