Re: [PATCH] drm/amdgpu: Move reset domain locking in DPC handler

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



My bad, I see u already fixed this in amd-staging-drm-next. We had an issue in an internal branch with this and just reinvented the wheel :))

Andrey

On 2022-04-14 10:32, Andrey Grodzovsky wrote:
Yea, i need to improve it a bit, ignore this one, will be back with V2.

Andrey

On 2022-04-14 03:12, Chen, Guchun wrote:
It's in amdgpu_pci_resume.

Andrey, shall we modify the code accordingly in amdgpu_pci_resume as well? Otherwise, an unset/unlock leak will happen when pci_channel_state != pci_channel_io_frozen.

Regards,
Guchun

-----Original Message-----
From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of Christian König
Sent: Thursday, April 14, 2022 2:40 PM
To: Grodzovsky, Andrey <Andrey.Grodzovsky@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: Antonovitch, Anatoli <Anatoli.Antonovitch@xxxxxxx>
Subject: Re: [PATCH] drm/amdgpu: Move reset domain locking in DPC handler



Am 13.04.22 um 21:31 schrieb Andrey Grodzovsky:
Lock reset domain unconditionally because on resume we unlock it
unconditionally.
This solved mutex deadlock when handling both FATAL and non FATAL PCI
errors one after another.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx>
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 14 +++++++-------
   1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 1cc488a767d8..c65f25e3a0fc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5531,18 +5531,18 @@ pci_ers_result_t
amdgpu_pci_error_detected(struct pci_dev *pdev, pci_channel_sta
          adev->pci_channel_state = state;
   +    /*
+     * Locking adev->reset_domain->sem will prevent any external access
+     * to GPU during PCI error recovery
+     */
+    amdgpu_device_lock_reset_domain(adev->reset_domain);
+    amdgpu_device_set_mp1_state(adev);
+
       switch (state) {
       case pci_channel_io_normal:
           return PCI_ERS_RESULT_CAN_RECOVER;
BTW: Where are we unlocking that again?

       /* Fatal error, prepare for slot reset */
       case pci_channel_io_frozen:
-        /*
-         * Locking adev->reset_domain->sem will prevent any external access
-         * to GPU during PCI error recovery
-         */
- amdgpu_device_lock_reset_domain(adev->reset_domain);
-        amdgpu_device_set_mp1_state(adev);
-
           /*
            * Block any work scheduling as we do for regular GPU reset
            * for the duration of the recovery



[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux