Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler

Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx> · Mon, 24 Jan 2022 12:08:00 -0500

It's just an infrastructure you use when you need.
I never tested it during reset i think but, we deliberately did it very 
self reliant where you simply iterate a FIFO of the dump through PMI3 
registers interface and dump out the content. It currently supposed to 
work for the NV family.

In case you encounter issues during reset let me know and I will do my 
best to resolve them.

Andrey

On 2022-01-24 11:38, Sharma, Shashank wrote:
Hey Andrey,
That seems like a good idea, may I know if there is a trigger for STB 
dump ? or is it just the infrastructure which one can use when they 
feel a need to dump info ? Also, how reliable is the STB infra during 
a reset ?

Regards
Shashank
On 1/24/2022 5:32 PM, Andrey Grodzovsky wrote:
You probably can add the STB dump we worked on a while ago to your 
info dump - a reminder
on the feature is here 
https://www.spinics.net/lists/amd-gfx/msg70751.html

Andrey

On 2022-01-21 15:34, Sharma, Shashank wrote:
From 899ec6060eb7d8a3d4d56ab439e4e6cdd74190a4 Mon Sep 17 00:00:00 2001
From: Somalapuram Amaranath <Amaranath.Somalapuram@xxxxxxx>
Date: Fri, 21 Jan 2022 14:19:42 +0530
Subject: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler

This patch adds a GPU reset handler for Navi ASIC family, which
typically dumps some of the registersand sends a trace event.

V2: Accomodated call to work function to send uevent

Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram@xxxxxxx>
Signed-off-by: Shashank Sharma <shashank.sharma@xxxxxxx>
---
 drivers/gpu/drm/amd/amdgpu/nv.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c 
b/drivers/gpu/drm/amd/amdgpu/nv.c
index 01efda4398e5..ada35d4c5245 100644
--- a/drivers/gpu/drm/amd/amdgpu/nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/nv.c
@@ -528,10 +528,38 @@ nv_asic_reset_method(struct amdgpu_device *adev)
     }
 }

+static void amdgpu_reset_dumps(struct amdgpu_device *adev)
+{
+    int r = 0, i;
+
+    /* original raven doesn't have full asic reset */
+    if ((adev->apu_flags & AMD_APU_IS_RAVEN) &&
+        !(adev->apu_flags & AMD_APU_IS_RAVEN2))
+        return;
+    for (i = 0; i < adev->num_ip_blocks; i++) {
+        if (!adev->ip_blocks[i].status.valid)
+            continue;
+        if (!adev->ip_blocks[i].version->funcs->reset_reg_dumps)
+            continue;
+        r = adev->ip_blocks[i].version->funcs->reset_reg_dumps(adev);
+
+        if (r)
+            DRM_ERROR("reset_reg_dumps of IP block <%s> failed %d\n",
+ adev->ip_blocks[i].version->funcs->name, r);
+    }
+
+    /* Schedule work to send uevent */
+    if (!queue_work(system_unbound_wq, &adev->gpu_reset_work))
+        DRM_ERROR("failed to add GPU reset work\n");
+
+    dump_stack();
+}
+
 static int nv_asic_reset(struct amdgpu_device *adev)
 {
     int ret = 0;

+    amdgpu_reset_dumps(adev);
     switch (nv_asic_reset_method(adev)) {
     case AMD_RESET_METHOD_PCI:
         dev_info(adev->dev, "PCI reset\n");