PGI compilers deliver the performance you need on CPUs, and the features you need for HPC applications development on GPU-accelerated systems. OpenACC and CUDA programs can run several times faster on a single Tesla V100 GPU compared to all the cores of a dual-socket server, and interoperate with MPI and OpenMP to deliver the full power of today’s multi-GPU servers.
Is your application 10s or 100s of thousands of lines of Fortran, C and C++ code? With OpenACC directives, you don’t have to parallelize all of it at once. You can identify hot loops and code regions using the PGPROF profiler, then incrementally parallelize and tune them one by one. OpenACC code remains 100% standard-compliant and portable to other compilers and platforms, and enables parallel processing on CPUs and GPUs using identical source code.
CloverLeaf, a Lagrangian-Eulerian explicit hydrodynamics mini-application, is a small (4,500 line) lightweight application that is representative of a code used at the United Kingdom’s Atomic Weapons Establishment (AWE). Using OpenACC, performance on an NVIDIA V100 GPU is four times faster than a dual-socket 40-core Intel Skylake CPU, running the fully optimized code on the bm32 data set. It scales to almost 15 times faster on 4xV100s using MPI+OpenACC. The optimizations to the source code made during porting to the GPU using OpenACC improved the performance of the CPU code by more than 50%.
HPC servers are quickly expanding beyond multicore x86 CPUs to OpenPOWER, Arm and GPU accelerators. PGI Fortran, C and C++ compilers and OpenACC are designed to deliver high performance on all of these processors. PGI compilers for x86, OpenPOWER and GPUs are available now, including OpenACC parallelization across all cores of a multicore CPU or a GPU. PGI and OpenACC deliver the performance you need today, and the flexibility you need tomorrow. PGI compilers can take you there.
The PGI Profiler is a powerful and easy-to-use interactive performance profiler for parallel programs written with OpenMP or OpenACC directives, or using CUDA. Use it to visualize and analyze the performance of your Fortran, C and C++ programs. The PGI Profiler can correlate execution time with procedures, source code and instructions, allowing you to quickly see where and how execution time is spent. Through resource utilization data and compiler feedback information, the PGI Profiler provides features that will help you understand why parts of your program have high execution times and how you can modify your source code or compiler options to improve performance. The PGI Profiler is included with all PGI products.
The PGI graphical debugger for Fortran, C and C++ supports debugging of serial and parallel programs including MPI, OpenMP and hybrid MPI/OpenMP applications. The PGI Debugger can debug programs on SMP workstations, servers, distributed-memory clusters and hybrid clusters where each node contains multiple multicore x86 processors. It allows you to control threads or processes individually or in groups, and allows you to examine state down to the register level. The PGI Debugger is included with all PGI products for x86-64 platforms.