Introduction to nvprof/pgprof
Published:
A simple introduction to nvprof/pgprof tools for analyzing the hotspot of MPI-OpenACC/CUDA hterogeneous parallel code.
Install nvprof (based on Ubuntu)
- sudo apt install nvidia-profiler
- sudo apt install nvidia-visual-profiler
- sudo apt install nvidia-cuda-gdb
- sudo apt install nvidia-cuda-toolkit
Run MPI code with nvprof/pgprof
mpirun -np 4 nvprof –unified-memory-profiling off
–output-profile profile.%q{OMPI_COMM_WORLD_RANK}
–process-name “rank %q{OMPI_COMM_WORLD_RANK}”
–context-name “rank %q{OMPI_COMM_WORLD_RANK}” ./a.outmpirun -n 4 pgprof –unified-memory-profiling off
–output-profile profile.%q{OMPI_COMM_WORLD_RANK}
–process-name “rank %q{OMPI_COMM_WORLD_RANK}”
–context-name “rank %q{OMPI_COMM_WORLD_RANK}” ./a.out
Load profile data in nvprof/pgprof
- /usr/bin/nvvp profile.*
- Or load data with import in pgprof
Analyze dram utilization
- See the available parameters:
- nvprof –query-metrics
- View the read/write bandwidth and DRAM usage:
- nvprof –metrics “dram_read_throughput,dram_write_throughput,dram_utilization” ./a.out
- pgprof –metrics “dram_read_throughput,dram_write_throughput,dram_utilization” ./a.out