Re: [PATCH] drm/amdgpu: correctly report gpu recover status

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Evan,

But still what I care more(which is also the easiest way to me) is the correct return value of the API.
Well exactly that's the point ther return value is not correct for the API.

For example when the GPU reset function would return -EFAULT your program which reads the debugfs file would crash with a segmentation fault. That is not correct behavior.

In other words the result of the GPU reset can't be used as result of the debugfs read.

Regards,
Christian.

Am 19.12.19 um 02:48 schrieb Quan, Evan:
Hi Christian,

Here is some background for this change:
I'm debugging a random failure issue on baco reset.
I used a while loop to run the continuous baco reset tests and hope it can exit immediately on failure occurred.
However, due to wrong return value, it did not. And as you can image, the failure scene was ruined.

I can add this "seq_printf(m, "gpu recover %d\n", r);".
But still what I care more(which is also the easiest way to me) is the correct return value of the API.

Regards,
Evan
-----Original Message-----
From: Christian König <ckoenig.leichtzumerken@xxxxxxxxx>
Sent: Wednesday, December 18, 2019 5:57 PM
To: Quan, Evan <Evan.Quan@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Subject: Re: [PATCH] drm/amdgpu: correctly report gpu recover status

Am 18.12.19 um 04:25 schrieb Evan Quan:
Knowing whether gpu recovery was performed successfully or not is
important for our BACO development.

Change-Id: I0e3ca4dcb65a053eb26bc55ad7431e4a42e160de
Signed-off-by: Evan Quan <evan.quan@xxxxxxx>
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 4 +---
   1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index e9efee04ca23..5dff5c0dd882 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -743,9 +743,7 @@ static int amdgpu_debugfs_gpu_recover(struct
seq_file *m, void *data)
   	struct amdgpu_device *adev = dev->dev_private;

   	seq_printf(m, "gpu recover\n");
-	amdgpu_device_gpu_recover(adev, NULL);
-
-	return 0;
+	return amdgpu_device_gpu_recover(adev, NULL);
NAK, what we could do here is the following:

r = amdgpu_device_gpu_recover(....);
seq_printf(m, "gpu recover %d\n", r);

But returning the error code from the GPU recovery to userspace doesn't make
to much sense.

Christian.

   }

   static const struct drm_info_list amdgpu_debugfs_fence_list[] = {

_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux