On 2/10/2022 6:29 AM, Somalapuram, Amaranath wrote:
On 2/9/2022 1:17 PM, Christian König wrote:
Am 08.02.22 um 16:28 schrieb Alex Deucher:
On Tue, Feb 8, 2022 at 3:17 AM Somalapuram Amaranath
<Amaranath.Somalapuram@xxxxxxx> wrote:
Dump the list of register values to trace event on GPU reset.
Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram@xxxxxxx>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 21 ++++++++++++++++++++-
drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 19 +++++++++++++++++++
2 files changed, 39 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 1e651b959141..057922fb7e37 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4534,6 +4534,23 @@ int amdgpu_device_pre_asic_reset(struct
amdgpu_device *adev,
return r;
}
+static int amdgpu_reset_reg_dumps(struct amdgpu_device *adev)
+{
+ int i;
+ uint32_t reg_value[128];
+
+ for (i = 0; adev->reset_dump_reg_list[i] != 0; i++) {
+ if (adev->asic_type >= CHIP_NAVI10)
This check should be against CHIP_VEGA10. Also, this only allows for
GC registers. If we wanted to dump other registers, we'd need a
different macro. Might be better to just use RREG32 here for
everything and then encode the full offset using
SOC15_REG_ENTRY_OFFSET() or a similar macro. Also, we need to think
about how to handle gfxoff in this case. gfxoff needs to be disabled
or we'll hang the chip if we try and read GC or SDMA registers via
MMIO which will adversely affect the hang signature.
Well this should execute right before a GPU reset, so I think it
shouldn't matter if we hang the chip or not as long as the read comes
back correctly (I remember a very long UVD debug session because of
this).
But in general I agree, we should just use RREG32() here and always
encode the full register offset.
Regards,
Christian.
Can I use something like this:
+ reg_value[i] =
RREG32((adev->reg_offset[adev->reset_dump_reg_list[i][0]]
+ [adev->reset_dump_reg_list[i][1]]
+ [adev->reset_dump_reg_list[i][2]])
+ + adev->reset_dump_reg_list[i][3]);
ip --> adev->reset_dump_reg_list[i][0]
inst --> adev->reset_dump_reg_list[i][1]
BASE_IDX--> adev->reset_dump_reg_list[i][2]
reg --> adev->reset_dump_reg_list[i][3]
which requires 4 values in user space for each register.
using any existing macro like RREG32_SOC15** will not be able to pass
proper argument from user space (like ip##_HWIP or reg##_BASE_IDX)
Why cant we use just a simple array
adev->reset_dump_reg_list[10] for both ip and reg offsets ?
Userspace can provide the IP engine enum in first entry of the array,
reset_dump_reg_list[0], and register offsets in other entries starting
from 1. We can convert that into desirable engine substring using an
array of char *, something like:
const char *ip_engine_name_substing[] = {
/* Same order as enum amd_hw_ip_block_type */
"GC", "HDP", ......
}
engine enum;
u32 ip = adev->reset_dump_reg_list[0];
const char *ip_name = ip_engine_name_subs[ip];
for (i = 0; i < 9; i++) {
reg_val = RREG_SOC15_IP(ip_name, reset_dump_reg_list[i+1]);
}
- Shashank
Alex
+ reg_value[i] = RREG32_SOC15_IP(GC,
adev->reset_dump_reg_list[i]);
+ else
+ reg_value[i] =
RREG32(adev->reset_dump_reg_list[i]);
+ }
+
+ trace_amdgpu_reset_reg_dumps(adev->reset_dump_reg_list, reg_value,
i);
+
+ return 0;
+}
+
int amdgpu_do_asic_reset(struct list_head *device_list_handle,
struct amdgpu_reset_context *reset_context)
{
@@ -4567,8 +4584,10 @@ int amdgpu_do_asic_reset(struct list_head
*device_list_handle,
tmp_adev->gmc.xgmi.pending_reset = false;
if (!queue_work(system_unbound_wq,
&tmp_adev->xgmi_reset_work))
r = -EALREADY;
- } else
+ } else {
+ amdgpu_reset_reg_dumps(tmp_adev);
r = amdgpu_asic_reset(tmp_adev);
+ }
if (r) {
dev_err(tmp_adev->dev, "ASIC reset
failed with error, %d for drm dev, %s",
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
index d855cb53c7e0..3fe33de3564a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
@@ -537,6 +537,25 @@ TRACE_EVENT(amdgpu_ib_pipe_sync,
__entry->seqno)
);
+TRACE_EVENT(amdgpu_reset_reg_dumps,
+ TP_PROTO(long *address, uint32_t *value, int length),
+ TP_ARGS(address, value, length),
+ TP_STRUCT__entry(
+ __array(long, address, 128)
+ __array(uint32_t, value, 128)
+ __field(int, len)
+ ),
+ TP_fast_assign(
+ memcpy(__entry->address, address, 128);
+ memcpy(__entry->value, value, 128);
+ __entry->len = length;
+ ),
+ TP_printk("amdgpu register dump offset: %s value: %s ",
+ __print_array(__entry->address, __entry->len, 8),
+ __print_array(__entry->value, __entry->len, 8)
+ )
+);
+
#undef AMDGPU_JOB_GET_TIMELINE_NAME
#endif
--
2.25.1