, 3 min read

Day 2, Workshop Programming of Heterogeneous Systems in Physics

Original post is here eklausmeier.goip.de/blog/2014/07-17-day-2-workshop-programming-of-heterogeneous-systems-in-physics.


Day 2 of the conference had below talks. Prof. Dr. Bernd Brügmann gave a short introduction. He pointed out that Jena is number 10 in Physics in Germany, has ca. 100.000 inhabitants, and 20.000 students.

  1. Dr. Karl Rupp, Vienna, Lessons Learned in Developing the Linear Algebra Library ViennaCL. Notes: C++ operator overloading normally uses temporary, special trickery necessary to circumvent this, ViennaCL not callable from Fortran due to C++/operator overloading, eigen.tuxfamily.org, Karl Rupp's slides, with CUDA 5+6 OpenCL and CUDA are more or less on par,
  2. Prof. Dr. Rainer Heintzmann, Jena, CudaMat - a toolbox for Cuda computations. Photo Notes: Information on CudaMat, 300 GB fly-head, Delft Image Processing Library, wrote his own CUDA memory allocator with storage from heap, does not work on Octave
  3. Prof. Dipl.-Ing. Dr. Gundolf Haase, Graz, Interpolation with Radial Basis Functions on GPGPUs using CUDA. Photo Notes: AVL Graz, car industry, simulation software, OpenACC disappointing, significant speedup with GPU/CUDA, rule of thumb: start with OpenMP, then MIC, then OpenACC
  4. Lars Kühne, Jena, A Concurrent Algorithm for Computing the Flow Complex. Photo
  5. Axel Hübl, Helmholtz-Zentrum Dresden-Rossendorf, Scaling Plasma Simulations to more than 18,000 GPUs. Photo
  6. Carsten Eye Frigaard, www.lab4241.com, Running GADGET2 on GPUs: Optimizing Tree-search Algorithms by Detailed Profiling of GPU Code. Photo Notes: gpuprofgui, C-source level counters, PTX level counters, SASS level counters, BARRA, UNISIM
  7. M. Sc. Moritz Kreutzer, Erlangen, Building blocks for sparse linear algebra on heterogeneous hardware. Photo Notes: excellent speech, 45% comes from accelerator in Top50 supercomputers, vulnerability for hardware faults, fusing kernels, checkpoints, ESSEX programme/project, JDS, CRS, SSE, AVX, Sliced ELLPACK, computation done in permuted fashion
  8. Dipl.-Phys. Marcus Noack, Oslo, Parallel and simultaneous computation of eikonal and transport equations by taking full advantage of GPU computer architecture. Photo Notes: Oil, seismic, just a single CUDA kernel, used OpenMP
  9. Dr. Manfred Liebmann, Graz, Optimal Control of the Schrödinger Equation on Many-Core Architectures. Photo Notes: Crank-Nicholson much worse, Intel compiler not better than gcc/g++, 10+ PDEs per iteration, good initial approximation necessary, GPU two times faster, unitarity not a problem
  10. Dr. Johannes Langguth, Oslo, Scalable Finite Volume Computations in Heterogeneous Systems. Photo
  11. Dipl.-Inf. Ralf Seidler, Jena, Implementing the Radon Transform using Advanced Techniques on GPGPUs. Photo Notes: GTX750 consumer card, no problem with single precision
  12. Prof. Dr. Gerhard Zumbusch, Jena, A parallel functional language for high performance finite difference stencil codes. Photo Notes: Very interesting, excellent presentation, large gap between GFlop/s and memory speed, you have to fuse operations, Runge-Kutta discretization, you are measuring memory speed not computational speed
  13. Mohammed Sourouri, Oslo, An Optimized Intra-Node Communication Scheme Using Multiple CUDA Streams and OpenMP Threads. Photo
  14. Carsten Eckert, Helmholtz-Zentrum Dresden-Rossendorf, An adaptive, load-balanced MPI/GPU-Code for calculating the gain in High Power Laser media. Photo Notes: ArchLinux, 64 GPUs, all communication via MPI, 1 point = 1 kernel, Tesla K20M, Computational Radiation Physics, Monte-Carlo integration
  15. Dr. Erik Rodner, Jena, Computational Challenges for Visual Recognition with Deep Learning Architectures. Photo
  16. Dipl.-Phys. Richard Pausch, Dresden, Scalable, interactive 3D in-situ visualization of large-scale Simulations. Photo