nvprof --help-阿里云开发者社区

Usage: nvprof [options] [application] [application-arguments]
Options:
--aggregate-mode <on|off>
Turn on/off aggregate mode for events and metrics specified
by subsequent "--events" and "--metrics" options. Those
event/metric values will be collected for each domain instance,
instead of the whole device. Allowed values:
on - turn on aggregate mode (default)
off - turn off aggregate mode
--analysis-metrics
Collect profiling data that can be imported to Visual Profiler's
"analysis" mode. Note: Use "--export-profile" to specify
an export file.
--concurrent-kernels <on|off>
Turn on/off concurrent kernel execution. If concurrent kernel
execution is off, all kernels running on one device will
be serialized. Allowed values:
on - turn on concurrent kernel execution (default)
off - turn off concurrent kernel execution
--continuous-sampling-interval <interval>
Set the continuous mode sampling interval in milliseconds.
Minimum is 1 ms. Default is 2 ms.
--dependency-analysis
Generate event dependency graph for host and device activities
and run dependency analysis.
--device-buffer-size <size in MBs>
Set the device memory size (in MBs) reserved for storing
profiling data for non-CDP operations, especially for concurrent
kernel tracing, for each buffer on a context. The default
value is 8MB. The size should be a positive integer.
--device-cdp-buffer-size <size in MBs>
Set the device memory size (in MBs) reserved for storing
profiling data for CDP operations for each buffer on a context.
The default value is 8MB. The size should be a positive
integer.
--devices <device ids>
Change the scope of subsequent "--events", "--metrics", "--query-events"
and "--query-metrics" options.
Allowed values:
all - change scope to all valid devices
comma-separated device IDs - change scope to specified
devices
--event-collection-mode <mode>
Choose event collection mode for all events/metrics Allowed
values:
kernel - events/metrics are collected only for durations
of kernel executions (default)
continuous - events/metrics are collected for duration
of application. This is not applicable for non-tesla devices.
This mode is compatible only with NVLink events/metrics.
This modeis incompatible with "--profile-all-processes"
or "--profile-child-processes" or "--replay-mode kernel"
or "--replay-mode application".
-e, --events <event names>
Specify the events to be profiled on certain device(s). Multiple
event names separated by comma can be specified. Which device(s)
are profiled is controlled by the "--devices" option. Otherwise
events will be collected on all devices.
For a list of available events, use "--query-events".
Use "--events all" to profile all events available for each
device.
Use "--devices" and "--kernels" to select a specific kernel
invocation.
--kernel-latency-timestamps <on|off>
Turn on/off collection of kernel latency timestamps, namely
queued and submitted. The queued timestamp is captured when
a kernel launch command was queued into the CPU command
buffer. The submitted timestamp denotes when the CPU command
buffer containing this kernel launch was submitted to the
GPU. Turning this option on may incur an overhead during
profiling. Allowed values:
on - turn on collection of kernel latency timestamps
off - turn off collection of kernel latency timestamps
(default)
--kernels <kernel path syntax>
Change the scope of subsequent "--events", "--metrics" options.
The syntax is as follows:
<kernel name>
Limit scope to given kernel name.
or
<context id/name>:<stream id/name>:<kernel name>:<invocation>
The context/stream IDs, names, kernel name and invocation
can be regular expressions. Empty string matches any number
or characters. If <context id/name> or <stream id/name>
is a positive number, it's strictly matched against the
CUDA context/stream ID. Otherwise it's treated as a regular
expression and matched against the context/stream name specified
by the NVTX library. If the invocation count is a positive
number, it's strictly matched against the invocation of
the kernel. Otherwise it's treated as a regular expression.
Example: --kernels "1:foo:bar:2" will profile any kernel
whose name contains "bar" and is the 2nd instance on context
1 and on stream named "foo".
-m, --metrics <metric names>
Specify the metrics to be profiled on certain device(s).
Multiple metric names separated by comma can be specified.
Which device(s) are profiled is controlled by the "--devices"
option. Otherwise metrics will be collected on all devices.
For a list of available metrics, use "--query-metrics".
Use "--metrics all" to profile all metrics available for
each device.
Use "--devices" and "--kernels" to select a specific kernel
invocation.
Note: "--metrics all" does not include some metrics which
are needed for Visual Profiler's source level analysis.
For that, use "--analysis-metrics".
--pc-sampling-period <period>
Specify PC Sampling period in cycles, at which the sampling
records will be dumped. Allowed values for the period are
integers between 5 to 31 both inclusive.
This will set the sampling period to (2^period) cycles
Default value is a number between 5 and 12 based on the setup.Note:
Only available for GM20X+.
--profile-all-processes
Profile all processes launched by the same user who launched
this nvprof instance. Note: Only one instance of nvprof
can run with this option at the same time. Under this mode,
there's no need to specify an application to run.
--profile-api-trace <none|runtime|driver|all>
Turn on/off CUDA runtime/driver API tracing. Allowed values:
none - turn off API tracing
runtime - only turn on CUDA runtime API tracing
driver - only turn on CUDA driver API tracing
all - turn on all API tracing (default)
--profile-child-processes
Profile the application and all child processes launched
by it.
--profile-from-start <on|off>
Enable/disable profiling from the start of the application.
If it's disabled, the application can use {cu,cuda}Profiler{Start,Stop}
to turn on/off profiling. Allowed values:
on - enable profiling from start (default)
off - disable profiling from start
--profiling-semaphore-pool-size <count>
Set the profiling semaphore pool size reserved for storing
profiling data for serialized kernels and memory operations
for each context. The default value is 65536. The size should
be a positive integer.
--query-events
List all the events available on the device(s). Device(s)
queried can be controlled by the "--devices" option.
--query-metrics
List all the metrics available on the device(s). Device(s)
queried can be controlled by the "--devices" option.
--replay-mode <mode>
Choose replay mode used when not all events/metrics can be
collected in a single run. Allowed values:
disabled - replay is disabled, events/metrics couldn't
be profiled will be dropped
kernel - each kernel invocation is replayed (default)
application - the entire application is replayed.
This modeis incompatible with "--profile-all-processes"
or "profile-child-processes".
-a, --source-level-analysis <source level analysis names>
Specify the source level metrics to be profiled on a certain
kernel invocation. Use "--devices" and "--kernels" to select
a specific kernel invocation. Allowed values: one or more
of the following, separated by commas
global_access: global access
shared_access: shared access
branch: divergent branch
instruction_execution: instruction execution
pc_sampling: pc sampling, available only for GM20X+
Note: Use "--export-profile" to specify an export file.
--system-profiling <on|off>
Turn on/off power, clock, and thermal profiling. Allowed
values:
on - turn on system profiling
off - turn off system profiling (default)
-t, --timeout <seconds>
Set an execution timeout (in seconds) for the CUDA application.
Note: Timeout starts counting from the moment the CUDA driver
is initialized. If the application doesn't call any CUDA
APIs, timeout won't be triggered.
--track-memory-allocations <on|off>
Turn on/off tracking of memory operations, which involves
recording timestamps, memory size, memory type and program
counters of the memory allocations and frees. Turning this
option on may incur an overhead during profiling. Allowed
values:
on - turn on tracking of memory allocations and
free
off - turn off tracking of memory allocations and
free (default)
--unified-memory-profiling <per-process-device|off>
Configure unified memory profiling. Allowed values:
per-process-device - collect counts for each process
and each device (default)
off - turn off unified memory profiling
--cpu-profiling <on|off>
Turn on CPU profiling. Note: CPU profiling is not supported
in multi-process mode.
--cpu-profiling-explain-ccff <filename>
Path to a PGI pgexplain.xml file that should be used to interpret
Common Compiler Feedback Format (CCFF) messages.
--cpu-profiling-frequency <frequency>
Set the CPU profiling frequency in samples per second. Default
is 25Hz. Maximum is 500Hz.
--cpu-profiling-max-depth <depth>
Set the maximum depth of each call stack. Zero means no limit.
Default is zero.
--cpu-profiling-mode <flat|top-down|bottom-up>
Set the output mode of CPU profiling. Allowed values:
flat - Show flat profile
top-down - Show parent functions at the top
bottom-up - Show parent functions at the bottom
(default)
--cpu-profiling-percentage-threshold <threshold>
Filter out the entries that are below the set percentage
threshold. The limit should be an integer between 0 and
100, inclusive. Zero means no limit. Default is zero.
--cpu-profiling-scope <function|instruction>
Choose the profiling scope. Allowed values:
function - Each level in the stack trace represents
a distinct function (default)
instruction - Each level in the stack trace represents
a distinct instruction address
--cpu-profiling-show-ccff <on|off>
Choose whether to print Common Compiler Feedback Format (CCFF)
messages embedded in the binary. Note: this option implies
"--cpu-profiling-scope instruction".Default is off.
--cpu-profiling-show-library <on|off>
Choose whether to print the library name for each sample.
--cpu-profiling-thread-mode <separated|aggregated>
Set the thread mode of CPU profiling. Allowed values:
separated - Show separate profile for each thread
aggregated - Aggregate data from all threads (default)
--cpu-profiling-unwind-stack <on|off>
Choose whether to unwind the CPU call-stack at each sample
point. Default is on.
--openacc-profiling <on|off>
Enable/disable recording information from the OpenACC profiling
interface. Note: if the OpenACC profiling interface is available
depends on the OpenACC runtime. Default is on.
--context-name <name>
Name of the CUDA context.
"%i" in the context name string is replaced with
the ID of the context.
"%p" in the context name string is replaced with
the process ID of the application being profiled.
"%q{<ENV>}" in the context name string is replaced
with the value of the environment variable "<ENV>". If the
environment variable is not set it's an error.
"%h" in the context name string is replaced with
the hostname of the system.
"%%" in the context name string is replaced with
"%". Any other character following "%" is illegal.
--csv
Use comma-separated values in the output.
--demangling <on|off>
Turn on/off C++ name demangling of function names. Allowed
values:
on - turn on demangling (default)
off - turn off demangling
-u, --normalized-time-unit <s|ms|us|ns|col|auto>
Specify the unit of time that will be used in the output.
Allowed values:
s - second, ms - millisecond, us - microsecond,
ns - nanosecond
col - a fixed unit for each column
auto (default) - the scale is chosen for each value
based on its length.
--openacc-summary-mode <mode>
Set how durations are computed in the OpenACC summary. Allowed
values:
exclusive: show exclusive times (default)
inclusive: show inclusive times
--print-api-summary
Print a summary of CUDA runtime/driver API calls.
--print-api-trace
Print CUDA runtime/driver API trace.
--print-dependency-analysis-trace
Print dependency analysis trace.
--print-gpu-summary
Print a summary of the activities on the GPU (including CUDA
kernels and memcpy's/memset's).
--print-gpu-trace
Print individual kernel invocations (including CUDA memcpy's/memset's)
and sort them in chronological order. In event/metric profiling
mode, show events/metrics for each kernel invocation.
--print-openacc-constructs
Include parent construct names in OpenACC profile.
--print-openacc-summary
Print a summary of the OpenACC profile.
--print-openacc-trace
Print a trace of the OpenACC profile.
-s, --print-summary
Print a summary of the profiling result on screen. Note:
This is the default unless "--export-profile" or other print
options are used.
--print-summary-per-gpu
Print a summary of the profiling result for each GPU.
--process-name <name>
Name of the process.
"%p" in the process name string is replaced with
the process ID of the application being profiled.
"%q{<ENV>}" in the process name string is replaced
with the value of the environment variable "<ENV>". If the
environment variable is not set it's an error.
"%h" in the process name string is replaced with
the hostname of the system.
"%%" in the process name string is replaced with
"%". Any other character following "%" is illegal.
--quiet
Suppress all nvprof output.
--stream-name <name>
Name of the CUDA stream.
"%i" in the stream name string is replaced with the
ID of the stream.
"%p" in the stream name string is replaced with
the process ID of the application being profiled.
"%q{<ENV>}" in the stream name string is replaced
with the value of the environment variable "<ENV>". If the
environment variable is not set it's an error.
"%h" in the stream name string is replaced with
the hostname of the system.
"%%" in the stream name string is replaced with
"%". Any other character following "%" is illegal.
-o, --export-profile <filename>
Export the result file which can be imported later or opened
by the NVIDIA Visual Profiler.
"%p" in the file name string is replaced with the
process ID of the application being profiled.
"%q{<ENV>}" in the file name string is replaced
with the value of the environment variable "<ENV>". If the
environment variable is not set it's an error.
"%h" in the file name string is replaced with the
hostname of the system.
"%%" in the file name string is replaced with "%".
Any other character following "%" is illegal.
By default, this option disables the summary output. Note:
If the application being profiled creates child processes,
or if '--profile-all-processes' is used, the "%p" format
is needed to get correct export files for each process.
-f, --force-overwrite
Force overwriting all output files (any existing files will
be overwritten).
-i, --import-profile <filename>
Import a result profile from a previous run.
--log-file <filename>
Make nvprof send all its output to the specified file, or
one of the standard channels. The file will be overwritten.
If the file doesn't exist, a new one will be created.
"%1" as the whole file name indicates standard output
channel (stdout).
"%2" as the whole file name indicates standard error
channel (stderr). Note: This is the default.
"%p" in the file name string is replaced with the
process ID of the application being profiled.
"%q{<ENV>}" in the file name string is replaced
with the value of the environment variable "<ENV>". If the
environment variable is not set it's an error.
"%h" in the file name string is replaced with the
hostname of the system.
"%%" in the file name is replaced with "%".
Any other character following "%" is illegal.
--print-nvlink-topology
Print nvlink topology
-h, --help
Print this help information.
-V, --version
Print version information of this tool.
nvprof --help

热门文章

最新文章

相关电子书

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

nvprof --help

热门文章

最新文章

相关电子书