Introduction to nvprof/pgprof

less than 1 minute read

Published:

A simple introduction to nvprof/pgprof tools for analyzing the hotspot of MPI-OpenACC/CUDA hterogeneous parallel code.

Install nvprof (based on Ubuntu)

  • sudo apt install nvidia-profiler
  • sudo apt install nvidia-visual-profiler
  • sudo apt install nvidia-cuda-gdb
  • sudo apt install nvidia-cuda-toolkit

Run MPI code with nvprof/pgprof

  • mpirun -np 4 nvprof –unified-memory-profiling off
    –output-profile profile.%q{OMPI_COMM_WORLD_RANK}
    –process-name “rank %q{OMPI_COMM_WORLD_RANK}”
    –context-name “rank %q{OMPI_COMM_WORLD_RANK}” ./a.out

  • mpirun -n 4 pgprof –unified-memory-profiling off
    –output-profile profile.%q{OMPI_COMM_WORLD_RANK}
    –process-name “rank %q{OMPI_COMM_WORLD_RANK}”
    –context-name “rank %q{OMPI_COMM_WORLD_RANK}” ./a.out

Load profile data in nvprof/pgprof

  • /usr/bin/nvvp profile.*
  • Or load data with import in pgprof

Analyze dram utilization

  • See the available parameters:
    • nvprof –query-metrics
  • View the read/write bandwidth and DRAM usage:
    • nvprof –metrics “dram_read_throughput,dram_write_throughput,dram_utilization” ./a.out
    • pgprof –metrics “dram_read_throughput,dram_write_throughput,dram_utilization” ./a.out