On 1/17/2022 12:03 PM, Somalapuram Amaranath wrote:
AMDGPURESET uevent added to notify userspace, collect dump_stack and trace
Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram@xxxxxxx>
---
drivers/gpu/drm/amd/amdgpu/nv.c | 45 +++++++++++++++++++++++++++++++++
1 file changed, 45 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
index 2ec1ffb36b1f..b73147ae41fb 100644
--- a/drivers/gpu/drm/amd/amdgpu/nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/nv.c
@@ -529,10 +529,55 @@ nv_asic_reset_method(struct amdgpu_device *adev)
}
}
+/**
+ * drm_sysfs_reset_event - generate a DRM uevent
+ * @dev: DRM device
+ *
+ * Send a uevent for the DRM device specified by @dev. Currently we only
+ * set AMDGPURESET=1 in the uevent environment, but this could be expanded to
+ * deal with other types of events.
+ *
+ * Any new uapi should be using the drm_sysfs_connector_status_event()
+ * for uevents on connector status change.
+ */
+void drm_sysfs_reset_event(struct drm_device *dev)
+{
+ char *event_string = "AMDGPURESET=1";
+ char *envp[2] = { event_string, NULL };
+
+ kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
+}
+
+void amdgpu_reset_dumps(struct amdgpu_device *adev)
+{
+ struct drm_device *ddev = adev_to_drm(adev);
+ int r = 0, i;
+
+ /* original raven doesn't have full asic reset */
+ if ((adev->apu_flags & AMD_APU_IS_RAVEN) &&
+ !(adev->apu_flags & AMD_APU_IS_RAVEN2))
+ return;
+ for (i = 0; i < adev->num_ip_blocks; i++) {
+ if (!adev->ip_blocks[i].status.valid)
+ continue;
+ if (!adev->ip_blocks[i].version->funcs->reset_reg_dumps)
+ continue;
+ r = adev->ip_blocks[i].version->funcs->reset_reg_dumps(adev);
+
+ if (r)
+ DRM_ERROR("reset_reg_dumps of IP block <%s> failed %d\n",
+ adev->ip_blocks[i].version->funcs->name, r);
+ }
+
+ drm_sysfs_reset_event(ddev);
+ dump_stack();
+}
+
static int nv_asic_reset(struct amdgpu_device *adev)
{
int ret = 0;
+ amdgpu_reset_dumps(adev);
Alex recently added a patch to reset GPU on suspend. It doesn't make
sense to send an event in such cases, guess the original intention is
for gpu recovery related cases.
Thanks,
Lijo
switch (nv_asic_reset_method(adev)) {
case AMD_RESET_METHOD_PCI:
dev_info(adev->dev, "PCI reset\n");