Genesis GE-i940 Tesla

On September 28, 2009, a workstation Genesis GE-i940 Tesl, based on both GPGPU* and nVidia/CUDA** Technologies has been installed at DSA/LabMNCP.

It is a testbed for developing advanced simulation in the following research field:

  • Stochastic simulation;
  • Molecular Dynamics;
  • Atmospheric and climate modeling;
  • Weather forecast investigation;
  • Grid/Cloud Hybrid Virtualization;

*
“GPGPU stands for General-Purpose computation on Graphics Processing Units, also known as GPU Computing. Graphics Processing Units (GPUs) are high-performance many-core processors capable of very high computation and data throughput. See more here.”

**
“NVIDIA® CUDA™ is a general purpose parallel computing architecture that leverages the parallel compute engine in NVIDIA graphics processing units (GPUs) to solve many complex computational problems in a fraction of the time required on a CPU. See more here. “

Hardware
ge-image
Mainboard Asus x58/ICH10R 3 PCI-Express x16, 6 SAT, 2 SAS, 3+6 USB
CPU i7-940 2,93 133 GHz fsb, Quad Core 8 Mb cache
RAM 6 x 2Gb DRR 3 1333 DIM
Hard Disk 2 x 500 Gb SATA 16Mb cache 7.200 RPM
GPU 1 Quadro FX5800 4Gb RAM
2 x Tesla C1060 4 Gb RAM
Software
OS: GNU/Linux CentOs 5.3 64 Bit
Driver: nVidia Cuda 180.22 Linux 64bit
VMware: VMware-server-2.0.2
OUTPUT of First Test:
Serial simulation(ms) GPU(ms)
execution time for malloc 0.02 175.21 ms
execution time for RndGnr 51430.92 2283.19
execution time for init 275.48 0.31
execution time for computing 391391.12 329.19 ms
execution time for I/O 56822.77 64740.54 ms
execution time for GPU/CPU 198.43 ms

Output using GPU,

device 0           : Quadro FX 5800
device 1           : Tesla C1060
device 2           : Tesla C1060

Selected device: 2 <<<<<<<<<<<<<<<<<<

device 2 : Tesla C1060 major/minor : 1.3 compute capability Total global mem : -262144 bytes Shared block mem : 16384 bytes RegsPerBlock : 16384 WarpSize : 32 MaxThreadsPerBlock : 512 TotalConstMem : 65536 bytes ClockRate : 1296000 (kHz) deviceOverlap : 1 deviceOverlap : 1 MultiProcessorCount: 30


Using 1048576 particles 100 time steps