RE: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[AMD Official Use Only]



-----Original Message-----
From: Koenig, Christian <Christian.Koenig@xxxxxxx> 
Sent: Monday, January 17, 2022 3:33 PM
To: Somalapuram, Amaranath <Amaranath.Somalapuram@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Sharma, Shashank <Shashank.Sharma@xxxxxxx>
Subject: Re: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset

Am 17.01.22 um 11:01 schrieb Somalapuram, Amaranath:
> [AMD Official Use Only]
>
>
>
> -----Original Message-----
> From: Koenig, Christian <Christian.Koenig@xxxxxxx>
> Sent: Monday, January 17, 2022 12:57 PM
> To: Somalapuram, Amaranath <Amaranath.Somalapuram@xxxxxxx>; 
> amd-gfx@xxxxxxxxxxxxxxxxxxxxx
> Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Sharma, Shashank 
> <Shashank.Sharma@xxxxxxx>
> Subject: Re: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU 
> reset
>
> Am 17.01.22 um 07:33 schrieb Somalapuram Amaranath:
>> AMDGPURESET uevent added to notify userspace, collect dump_stack and 
>> trace
>>
>> Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram@xxxxxxx>
>> ---
>>    drivers/gpu/drm/amd/amdgpu/nv.c | 45 +++++++++++++++++++++++++++++++++
>>    1 file changed, 45 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c 
>> b/drivers/gpu/drm/amd/amdgpu/nv.c index 2ec1ffb36b1f..b73147ae41fb
>> 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/nv.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
>> @@ -529,10 +529,55 @@ nv_asic_reset_method(struct amdgpu_device *adev)
>>    	}
>>    }
>>    
>> +/**
>> + * drm_sysfs_reset_event - generate a DRM uevent
>> + * @dev: DRM device
>> + *
>> + * Send a uevent for the DRM device specified by @dev.  Currently we 
>> +only
>> + * set AMDGPURESET=1 in the uevent environment, but this could be 
>> +expanded to
>> + * deal with other types of events.
>> + *
>> + * Any new uapi should be using the
>> +drm_sysfs_connector_status_event()
>> + * for uevents on connector status change.
>> + */
>> +void drm_sysfs_reset_event(struct drm_device *dev)
> This should probably go directly into the DRM subsystem.
>
>> +{
>> +	char *event_string = "AMDGPURESET=1";
>> +	char *envp[2] = { event_string, NULL };
>> +
>> +	kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
> As I said this approach is a clear NAK. We can't allocate any memory when we do a reset.
>
> Regards,
> Christian.
>
> Can I do something like this:
>
> void drm_sysfs_reset_event(struct drm_device *dev)
>   {
> -       char *event_string = "AMDGPURESET=1";
> -       char *envp[2] = { event_string, NULL };
> +       char **envp;
> +
> +       envp = kcalloc(2,sizeof(char *), GFP_ATOMIC);
> +       envp[0] = kcalloc(30, sizeof(char), GFP_ATOMIC);
> +       envp[1] = kcalloc(30, sizeof(char), GFP_ATOMIC);

No, not really. kobject_uevent_env() will still allocate memory without GFP_ATOMIC.

I think the whole approach of using udev won't work for this.

Regards,
Christian.

I have tested it with sample applications: 
Gpu reset:
sudo cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover

And I never missed the AMDGPURESET=1 event in user space, 
even with continues resets with sudo cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover .

Regards,
S.Amarnath
> +
> +       strcpy(envp[0], "AMDGPURESET=1");
> +       strcpy(envp[1], "");
> +
>
>          kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, 
> envp);
> +
> +       kfree(envp[0]);
> +       kfree(envp[1]);
> +       kfree(envp);
>   }
>
> Regards,
> S.Amarnath
>
>> +}
>> +
>> +void amdgpu_reset_dumps(struct amdgpu_device *adev) {
>> +	struct drm_device *ddev = adev_to_drm(adev);
>> +	int r = 0, i;
>> +
>> +	/* original raven doesn't have full asic reset */
>> +	if ((adev->apu_flags & AMD_APU_IS_RAVEN) &&
>> +		!(adev->apu_flags & AMD_APU_IS_RAVEN2))
>> +		return;
>> +	for (i = 0; i < adev->num_ip_blocks; i++) {
>> +		if (!adev->ip_blocks[i].status.valid)
>> +			continue;
>> +		if (!adev->ip_blocks[i].version->funcs->reset_reg_dumps)
>> +			continue;
>> +		r = adev->ip_blocks[i].version->funcs->reset_reg_dumps(adev);
>> +
>> +		if (r)
>> +			DRM_ERROR("reset_reg_dumps of IP block <%s> failed %d\n",
>> +					adev->ip_blocks[i].version->funcs->name, r);
>> +	}
>> +
>> +	drm_sysfs_reset_event(ddev);
>> +	dump_stack();
>> +}
>> +
>>    static int nv_asic_reset(struct amdgpu_device *adev)
>>    {
>>    	int ret = 0;
>>    
>> +	amdgpu_reset_dumps(adev);
>>    	switch (nv_asic_reset_method(adev)) {
>>    	case AMD_RESET_METHOD_PCI:
>>    		dev_info(adev->dev, "PCI reset\n");




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux