Re: [PATCH 3/4] drm/amdgpu: add reset register trace dump function

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 1/24/2022 6:07 PM, Christian König wrote:


Am 24.01.22 um 18:00 schrieb Sharma, Shashank:


On 1/24/2022 5:48 PM, Christian König wrote:
Am 24.01.22 um 17:45 schrieb Sharma, Shashank:
Hello Christian,
Thank for your comments, please fine mine inline:

On 1/24/2022 8:15 AM, Christian König wrote:
Am 21.01.22 um 21:34 schrieb Sharma, Shashank:
From 1c5c552eeddaffd9fb3e7d45ece1b2b28fccc575 Mon Sep 17 00:00:00 2001
From: Somalapuram Amaranath <Amaranath.Somalapuram@xxxxxxx>
Date: Fri, 21 Jan 2022 14:19:10 +0530
Subject: [PATCH 3/4] drm/amdgpu: add reset register trace dump function for
 gfx_v10_0

Implementation of register trace dump function on the AMD GPU resets

Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram@xxxxxxx>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h |  8 ++++
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c    | 53 ++++++++++++++++++++++-
 drivers/gpu/drm/amd/include/amd_shared.h  |  1 +
 3 files changed, 60 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
index d855cb53c7e0..c97b53b54333 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
@@ -537,6 +537,14 @@ TRACE_EVENT(amdgpu_ib_pipe_sync,
               __entry->seqno)
 );

+TRACE_EVENT(gfx_v10_0_reset_reg_dumps,
+        TP_PROTO(char *reg_dumps),
+        TP_ARGS(reg_dumps),
+        TP_STRUCT__entry(__string(dumps, reg_dumps)),
+        TP_fast_assign(__assign_str(dumps, reg_dumps);),
+        TP_printk("amdgpu register dump {%s}", __get_str(dumps))
+);
+
 #undef AMDGPU_JOB_GET_TIMELINE_NAME
 #endif

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 16dbe593cba2..05974ed5416d 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -46,7 +46,7 @@
 #include "v10_structs.h"
 #include "gfx_v10_0.h"
 #include "nbio_v2_3.h"
-
+#include "amdgpu_trace.h"
 /*
  * Navi10 has two graphic rings to share each graphic pipe.
  * 1. Primary ring
@@ -188,6 +188,12 @@
 #define RLCG_ERROR_REPORT_ENABLED(adev) \
     (amdgpu_sriov_reg_indirect_mmhub(adev) || amdgpu_sriov_reg_indirect_gc(adev))

+#define N_REGS (17)
+#define DUMP_REG(addr) do {                    \
+        (dump)[i][0] = (addr);                \
+        (dump)[i++][1] =  RREG32_SOC15_IP(GC, addr);    \
+    } while (0)

Those macros need an AMDGPU_ prefix and maybe some better names.

Agree, @Amar pls check this out.

 From the design POV I'm really wondering if it wouldn't be better if userspace defines the registers we want to dump in case of a crash.


I am not so sure about it. This means we have to communicate with the userspace and get the list of registers it wishes to see, do I smell a new IOCTL or DRM property ? I also feel that we have better filtering tools for a trace event in userspace than in kernel.

What do you think ?

Just a writeable debugfs file should probably do it. We have a list of known registers filled in during boot (just reg offsets) and userspace can read/write the file to update it.


Ok, so in this case, what would be our state machine ?
1. gpu_reset happened
2. uevent_sent from work queue
3. userspace goes and writes into debugfs files
4. kernel collects the desired register values

5 (a). kernel send the trace_event with desired values
6 (a). kernel sends another uevent to indicate trace_event

Or

5 (a+b). kernel loads the reg values via debugfs itself.

In both of these case, wouldn't it be too late to collect register values by the time userspace list arrives at kernel ?

Or do you have a better state machine in mind ?

1. system boots, kernel loads a default register list to dump.
2. Userspace optionally updates this register list.
....
3. GPU reset happens, we dump the registers into the trace log.
4. We send udev event to signal the GPU reset to umr or other tool to save and store the trace log.


Got it, let us rework in those lines.
- Shashank

Regards,
Christian.



[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux