Ping? On Thu, Mar 6, 2025 at 10:54 AM Alex Deucher <alexander.deucher@xxxxxxx> wrote: > > Describes what debugfs files are available and what > they are used for. > > v2: fix some typos (Mark Glines) > v3: Address comments from Siqueira and Kent > > Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx> > --- > Documentation/gpu/amdgpu/debugfs.rst | 210 +++++++++++++++++++++++++ > Documentation/gpu/amdgpu/debugging.rst | 7 + > Documentation/gpu/amdgpu/index.rst | 1 + > 3 files changed, 218 insertions(+) > create mode 100644 Documentation/gpu/amdgpu/debugfs.rst > > diff --git a/Documentation/gpu/amdgpu/debugfs.rst b/Documentation/gpu/amdgpu/debugfs.rst > new file mode 100644 > index 0000000000000..fdfc1a8773c72 > --- /dev/null > +++ b/Documentation/gpu/amdgpu/debugfs.rst > @@ -0,0 +1,210 @@ > +============== > +AMDGPU DebugFS > +============== > + > +The amdgpu driver provides a number of debugfs files to aid in debugging > +issues in the driver. Thse are usually found in > +/sys/kernel/debug/dri/<num>. > + > +DebugFS Files > +============= > + > +amdgpu_benchmark > +---------------- > + > +Run benchmarks using the DMA engine the driver uses for GPU memory paging. > +Write a number to the file to run the test. The results are written to the > +kernel log. VRAM is on device memory (dGPUs) or cave out (APUs) and GTT > +(Graphics Translation Tables) is system memory that is accessible by the GPU. > +The following tests are available: > + > +- 1: simple test, VRAM to GTT and GTT to VRAM > +- 2: simple test, VRAM to VRAM > +- 3: GTT to VRAM, buffer size sweep, powers of 2 > +- 4: VRAM to GTT, buffer size sweep, powers of 2 > +- 5: VRAM to VRAM, buffer size sweep, powers of 2 > +- 6: GTT to VRAM, buffer size sweep, common display sizes > +- 7: VRAM to GTT, buffer size sweep, common display sizes > +- 8: VRAM to VRAM, buffer size sweep, common display sizes > + > +amdgpu_test_ib > +-------------- > + > +Read this file to run simple IB (Indirect Buffer) tests on all kernel managed > +rings. IBs are command buffers usually generated by userspace applications > +which are submitted to the kernel for execution on an particular GPU engine. > +This just runs the simple IB tests included in the kernel. These tests > +are engine specific and verify that IB submission works. > + > +amdgpu_discovery > +---------------- > + > +Provides raw access to the IP discovery binary provided by the GPU. Read this > +file to acess the raw binary. This is useful for verifying the contents of > +the IP discovery table. It is chip specific. > + > +amdgpu_vbios > +------------ > + > +Provides raw access to the ROM binary image from the GPU. Read this file to > +access the raw binary. This is useful for verifying the contents of the > +video BIOS ROM. It is board specific. > + > +amdgpu_evict_gtt > +---------------- > + > +Evict all buffers from the GTT memory pool. Read this file to evict all > +buffers from this pool. > + > +amdgpu_evict_vram > +----------------- > + > +Evict all buffers from the VRAM memory pool. Read this file to evict all > +buffers from this pool. > + > +amdgpu_gpu_recover > +------------------ > + > +Trigger a GPU reset. Read this file to trigger reset the entire GPU. > +All work currently running on the GPU will be lost. > + > +amdgpu_ring_<name> > +------------------ > + > +Provides read access to the kernel managed ring buffers for each ring <name>. > +These are useful for debugging problems on a particular ring. The ring buffer > +is how the CPU sends commands to the GPU. The CPU writes commands into the > +buffer and then asks the GPU engine to process it. This is the raw binary > +contents of the ring buffer. Use a tool like UMR to decode the rings into human > +readable form. > + > +amdgpu_mqd_<name> > +----------------- > + > +Provides read access to the kernel managed MQD (Memory Queue Descriptor) for > +ring <name> managed by the kernel driver. MQDs define the features of the ring > +and are used to store the ring's state when it is not connected to hardware. > +The driver writes the requested ring features and metadata (GPU addresses of > +the ring itself and associated buffers) to the MQD and the firmware uses the MQD > +to populate the hardware when the ring is mapped to a hardware slot. Only > +available on engines which use MQDs. This provides access to the raw MQD > +binary. > + > +amdgpu_error_<name> > +------------------- > + > +Provides an interface to set an error code on the dma fences associated with > +ring <name>. The error code specified is propogated to all fences associated > +with the ring. Use this to inject a fence error into a ring. > + > +amdgpu_pm_info > +-------------- > + > +Provides human readable information about the power management features > +and state of the GPU. This includes current GFX clock, Memory clock, > +voltages, average SoC power, temperature, GFX load, Memory load, SMU > +feature mask, VCN power state, clock and power gating features. > + > +amdgpu_firmware_info > +-------------------- > + > +Lists the firmware versions for all firmwares used by the GPU. Only > +entries with a non-0 version are valid. If the version is 0, the firmware > +is not valid for the GPU. > + > +amdgpu_fence_info > +----------------- > + > +Shows the last signalled and emitted fence sequence numbers for each > +kernel driver managed ring. Fences are associated with submissions > +to the engine. Emitted fences have been submitted to the ring > +and signalled fences have been signalled by the GPU. Rings with a > +larger emitted fence value have outstanding work that is still being > +processed by the engine that owns that ring. When the emitted and > +signalled fence values are equal, the ring is idle. > + > +amdgpu_gem_info > +--------------- > + > +Lists all of the PIDs using the GPU and the GPU buffers that they have > +allocated. This lists the buffer size, pool (VRAM, GTT, etc.), and buffer > +attributes (CPU access required, CPU cache attributes, etc.). > + > +amdgpu_vm_info > +-------------- > + > +Lists all of the PIDs using the GPU and the GPU buffers that they have > +allocated as well as the status of those buffers relative to that process' > +GPU virtual address space (e.g., evicted, idle, invalidated, etc.). > + > +amdgpu_sa_info > +-------------- > + > +Prints out all of the suballocations (sa) by the suballocation manager in the > +kernel driver. Prints the GPU address, size, and fence info associated > +with each suballocation. The suballocations are used internally within > +the kernel driver for various things. > + > +amdgpu_<pool>_mm > +---------------- > + > +Prints TTM information about the memory pool <pool>. > + > +amdgpu_vram > +----------- > + > +Provides direct access to VRAM. Used by tools like UMR to inspect > +objects in VRAM. > + > +amdgpu_iomem > +------------ > + > +Provides direct access to GTT memory. Used by tools like UMR to inspect > +GTT memory. > + > +amdgpu_regs_* > +------------- > + > +Provides direct access to various register aperatures on the GPU. Used > +by tools like UMR to access GPU registers. > + > +amdgpu_regs2 > +------------ > + > +Provides an IOCTL interface used by UMR for interacting with GPU registers. > + > + > +amdgpu_sensors > +-------------- > + > +Provides an interface to query GPU power metrics (temperature, average > +power, etc.). Used by tools like UMR to query GPU power metrics. > + > + > +amdgpu_gca_config > +----------------- > + > +Provides an interface to query GPU details (Graphics/Compute Array config, > +PCI config, GPU family, etc.). Used by tools like UMR to query GPU details. > + > +amdgpu_wave > +----------- > + > +Used to query GFX/compute wave infomation from the hardware. Used by tools > +like UMR to query GFX/compute wave information. > + > +amdgpu_gpr > +---------- > + > +Used to query GFX/compute GPR (General Purpose Register) infomation from the > +hardware. Used by tools like UMR to query GPRs when debugging shaders. > + > +amdgpu_gprwave > +-------------- > + > +Provides an IOCTL interface used by UMR for interacting with shader waves. > + > +amdgpu_fw_attestation > +--------------------- > + > +Provides an interface for reading back firmware attestation records. > diff --git a/Documentation/gpu/amdgpu/debugging.rst b/Documentation/gpu/amdgpu/debugging.rst > index e75f97d0e4eaf..7cbfea0606e15 100644 > --- a/Documentation/gpu/amdgpu/debugging.rst > +++ b/Documentation/gpu/amdgpu/debugging.rst > @@ -2,6 +2,13 @@ > GPU Debugging > =============== > > +General Debugging Options > +========================= > + > +The DebugFS section provides documentation on a number files to aid in debugging > +issues on the GPU. > + > + > GPUVM Debugging > =============== > > diff --git a/Documentation/gpu/amdgpu/index.rst b/Documentation/gpu/amdgpu/index.rst > index 302d039928ee8..4c75567854cb2 100644 > --- a/Documentation/gpu/amdgpu/index.rst > +++ b/Documentation/gpu/amdgpu/index.rst > @@ -16,5 +16,6 @@ Next (GCN), Radeon DNA (RDNA), and Compute DNA (CDNA) architectures. > thermal > driver-misc > debugging > + debugfs > process-isolation > amdgpu-glossary > -- > 2.48.1 >