Parallelizing Compilers for CPUs and GPUs
Use PGI Fortran, C and C++ compilers to develop performance-portable applications for multicore x86-64 or OpenPOWER CPUs, and GPUs from NVIDIA. PGI compilers support Fortran 2003, C++14 and selected C++17 features. With PGI you can parallelize programs for multicore CPUs and NVIDIA GPUs using OpenACC 2.6, for multicore CPUs using OpenMP 4.5, and for NVIDIA GPUs using CUDA Fortran. PGI compilers are used on the world’s fastest computers, including on the Top 500 #1 Summit Supercomputer at Oak Ridge National Lab for GPU-accelerated CFD, quantum chemistry, weather and climate, molecular dynamics, and astrophysics applications. PGI compilers are for scientists and engineers using computing systems ranging from workstations to the fastest GPU-powered supercomputers.
World-class CPU Performance, GPU Acceleration
PGI compilers deliver the performance you need on CPUs, and the features you need for HPC applications development on GPU-accelerated systems. OpenACC and CUDA programs can run several times faster on a single Tesla V100 GPU compared to all the cores of a dual-socket server, and interoperate with MPI and OpenMP to deliver the full power of today’s multi-GPU servers.
Accelerate Your Code with OpenACC
Is your application 10s or 100s of thousands of lines of Fortran, C and C++ code? With OpenACC directives, you don’t have to parallelize all of it at once. You can identify hot loops and code regions using the PGPROF profiler, then incrementally parallelize and tune them one by one. OpenACC code remains 100% standard-compliant and portable to other compilers and platforms, and enables parallel processing on CPUs and GPUs using identical source code.
Performance Portability Delivered
CloverLeaf, a Lagrangian-Eulerian explicit hydrodynamics mini-application, is a small (4,500 line) lightweight application that is representative of a code used at the United Kingdom’s Atomic Weapons Establishment (AWE). Using OpenACC, performance on an NVIDIA V100 GPU is four times faster than a dual-socket 40-core Intel Skylake CPU, running the fully optimized code on the bm32 data set. It scales to almost 15 times faster on 4xV100s using MPI+OpenACC. The optimizations to the source code made during porting to the GPU using OpenACC improved the performance of the CPU code by more than 50%.
Will Your Compiler Take You There?
HPC servers are quickly expanding beyond multicore x86 CPUs to OpenPOWER, Arm and GPU accelerators. PGI Fortran, C and C++ compilers and OpenACC are designed to deliver high performance on all of these processors. PGI compilers for x86, OpenPOWER and GPUs are available now, including OpenACC parallelization across all cores of a multicore CPU or a GPU. PGI and OpenACC deliver the performance you need today, and the flexibility you need tomorrow. PGI compilers can take you there.
Performance Profiling and Optimization
The PGI Profiler is a powerful and easy-to-use interactive performance profiler for parallel programs written with OpenMP or OpenACC directives, or using CUDA. Use it to visualize and analyze the performance of your Fortran, C and C++ programs. The PGI Profiler can correlate execution time with procedures, source code and instructions, allowing you to quickly see where and how execution time is spent. Through resource utilization data and compiler feedback information, the PGI Profiler provides features that will help you understand why parts of your program have high execution times and how you can modify your source code or compiler options to improve performance. The PGI Profiler is included with all PGI products.
A Fortran-friendly Debugger
The PGI graphical debugger for Fortran, C and C++ supports debugging of serial and parallel programs including MPI, OpenMP and hybrid MPI/OpenMP applications. The PGI Debugger can debug programs on SMP workstations, servers, distributed-memory clusters and hybrid clusters where each node contains multiple multicore x86 processors. It allows you to control threads or processes individually or in groups, and allows you to examine state down to the register level. The PGI Debugger is included with all PGI products for x86-64 platforms.