いわて駐在研究日誌

OpenCAE、電子工作、R/C等、徒然なるままに

CUDA Fortran化

  • CUDA Fortran ファイル名一例

 *.cuf、*.CUF(CUDA Fortran プログラムファイルであることを明示)
 *.f、*.F、*.f90、*.F90、*.f95、*.F95 (従来の慣習名)

  • -Mcudaオプション

-Mcuda=cuda2.0 (CC2.0使用。この他、cuda1.3,emuなど)
    fastmath (fastmathライブラリ使用)
    cuda4.0 (cudaTK 4.0使用)

  • pggaccelinfo

[waku@ensis10 bin]$ pgaccelinfo
CUDA Driver Version: 4000
NVRM version: NVIDIA UNIX x86_64 Kernel Module 270.41.19 Mon May 16 23:32:08 PDT 2011

Device Number: 0
Device Name: Quadro 5000
Device Revision Number: 2.0
Global Memory Size: 2683502592
Number of Multiprocessors: 11
Number of Cores: 352
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 49152
Registers per Block: 32768
Warp Size: 32
Maximum Threads per Block: 1024
Maximum Block Dimensions: 1024, 1024, 64
Maximum Grid Dimensions: 65535 x 65535 x 65535
Maximum Memory Pitch: 2147483647B
Texture Alignment: 512B
Clock Rate: 1026 MHz
Initialization time: 690996 microseconds
Current free memory: 2623266816
Upload time (4MB): 897 microseconds ( 728 ms pinned)
Download time: 1016 microseconds ( 660 ms pinned)
Upload bandwidth: 4675 MB/sec (5761 MB/sec pinned)
Download bandwidth: 4128 MB/sec (6355 MB/sec pinned)
[waku@ensis10 bin]$