[AMD Official Use Only] -----Original Message----- From: Koenig, Christian <Christian.Koenig@xxxxxxx> Sent: Monday, January 17, 2022 3:33 PM To: Somalapuram, Amaranath <Amaranath.Somalapuram@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Sharma, Shashank <Shashank.Sharma@xxxxxxx> Subject: Re: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset Am 17.01.22 um 11:01 schrieb Somalapuram, Amaranath: > [AMD Official Use Only] > > > > -----Original Message----- > From: Koenig, Christian <Christian.Koenig@xxxxxxx> > Sent: Monday, January 17, 2022 12:57 PM > To: Somalapuram, Amaranath <Amaranath.Somalapuram@xxxxxxx>; > amd-gfx@xxxxxxxxxxxxxxxxxxxxx > Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Sharma, Shashank > <Shashank.Sharma@xxxxxxx> > Subject: Re: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU > reset > > Am 17.01.22 um 07:33 schrieb Somalapuram Amaranath: >> AMDGPURESET uevent added to notify userspace, collect dump_stack and >> trace >> >> Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram@xxxxxxx> >> --- >> drivers/gpu/drm/amd/amdgpu/nv.c | 45 +++++++++++++++++++++++++++++++++ >> 1 file changed, 45 insertions(+) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c >> b/drivers/gpu/drm/amd/amdgpu/nv.c index 2ec1ffb36b1f..b73147ae41fb >> 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/nv.c >> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c >> @@ -529,10 +529,55 @@ nv_asic_reset_method(struct amdgpu_device *adev) >> } >> } >> >> +/** >> + * drm_sysfs_reset_event - generate a DRM uevent >> + * @dev: DRM device >> + * >> + * Send a uevent for the DRM device specified by @dev. Currently we >> +only >> + * set AMDGPURESET=1 in the uevent environment, but this could be >> +expanded to >> + * deal with other types of events. >> + * >> + * Any new uapi should be using the >> +drm_sysfs_connector_status_event() >> + * for uevents on connector status change. >> + */ >> +void drm_sysfs_reset_event(struct drm_device *dev) > This should probably go directly into the DRM subsystem. > >> +{ >> + char *event_string = "AMDGPURESET=1"; >> + char *envp[2] = { event_string, NULL }; >> + >> + kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp); > As I said this approach is a clear NAK. We can't allocate any memory when we do a reset. > > Regards, > Christian. > > Can I do something like this: > > void drm_sysfs_reset_event(struct drm_device *dev) > { > - char *event_string = "AMDGPURESET=1"; > - char *envp[2] = { event_string, NULL }; > + char **envp; > + > + envp = kcalloc(2,sizeof(char *), GFP_ATOMIC); > + envp[0] = kcalloc(30, sizeof(char), GFP_ATOMIC); > + envp[1] = kcalloc(30, sizeof(char), GFP_ATOMIC); No, not really. kobject_uevent_env() will still allocate memory without GFP_ATOMIC. I think the whole approach of using udev won't work for this. Regards, Christian. I have tested it with sample applications: Gpu reset: sudo cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover And I never missed the AMDGPURESET=1 event in user space, even with continues resets with sudo cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover . Regards, S.Amarnath > + > + strcpy(envp[0], "AMDGPURESET=1"); > + strcpy(envp[1], ""); > + > > kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, > envp); > + > + kfree(envp[0]); > + kfree(envp[1]); > + kfree(envp); > } > > Regards, > S.Amarnath > >> +} >> + >> +void amdgpu_reset_dumps(struct amdgpu_device *adev) { >> + struct drm_device *ddev = adev_to_drm(adev); >> + int r = 0, i; >> + >> + /* original raven doesn't have full asic reset */ >> + if ((adev->apu_flags & AMD_APU_IS_RAVEN) && >> + !(adev->apu_flags & AMD_APU_IS_RAVEN2)) >> + return; >> + for (i = 0; i < adev->num_ip_blocks; i++) { >> + if (!adev->ip_blocks[i].status.valid) >> + continue; >> + if (!adev->ip_blocks[i].version->funcs->reset_reg_dumps) >> + continue; >> + r = adev->ip_blocks[i].version->funcs->reset_reg_dumps(adev); >> + >> + if (r) >> + DRM_ERROR("reset_reg_dumps of IP block <%s> failed %d\n", >> + adev->ip_blocks[i].version->funcs->name, r); >> + } >> + >> + drm_sysfs_reset_event(ddev); >> + dump_stack(); >> +} >> + >> static int nv_asic_reset(struct amdgpu_device *adev) >> { >> int ret = 0; >> >> + amdgpu_reset_dumps(adev); >> switch (nv_asic_reset_method(adev)) { >> case AMD_RESET_METHOD_PCI: >> dev_info(adev->dev, "PCI reset\n");