Patch "drm/amd: Fail the suspend if resources can't be evicted" has been added to the 6.0-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    drm/amd: Fail the suspend if resources can't be evicted

to the 6.0-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     drm-amd-fail-the-suspend-if-resources-can-t-be-evict.patch
and it can be found in the queue-6.0 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 17cf9287a38bceff20e382a8cb7e4125910088a4
Author: Mario Limonciello <mario.limonciello@xxxxxxx>
Date:   Wed Oct 26 14:03:55 2022 -0500

    drm/amd: Fail the suspend if resources can't be evicted
    
    [ Upstream commit 8d4de331f1b24a22d18e3c6116aa25228cf54854 ]
    
    If a system does not have swap and memory is under 100% usage,
    amdgpu will fail to evict resources.  Currently the suspend
    carries on proceeding to reset the GPU:
    
    ```
    [drm] evicting device resources failed
    [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <vcn_v3_0> failed -12
    [drm] free PSP TMR buffer
    [TTM] Failed allocating page table
    [drm] evicting device resources failed
    amdgpu 0000:03:00.0: amdgpu: MODE1 reset
    amdgpu 0000:03:00.0: amdgpu: GPU mode1 reset
    amdgpu 0000:03:00.0: amdgpu: GPU smu mode1 reset
    ```
    
    At this point if the suspend actually succeeded I think that amdgpu
    would have recovered because the GPU would have power cut off and
    restored.  However the kernel fails to continue the suspend from the
    memory pressure and amdgpu fails to run the "resume" from the aborted
    suspend.
    
    ```
    ACPI: PM: Preparing to enter system sleep state S3
    SLUB: Unable to allocate memory on node -1, gfp=0xdc0(GFP_KERNEL|__GFP_ZERO)
      cache: Acpi-State, object size: 80, buffer size: 80, default order: 0, min order: 0
      node 0: slabs: 22, objs: 1122, free: 0
    ACPI Error: AE_NO_MEMORY, Could not update object reference count (20210730/utdelete-651)
    
    [drm:psp_hw_start [amdgpu]] *ERROR* PSP load kdb failed!
    [drm:psp_resume [amdgpu]] *ERROR* PSP resume failed
    [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* resume of IP block <psp> failed -62
    amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_resume failed (-62).
    PM: dpm_run_callback(): pci_pm_resume+0x0/0x100 returns -62
    amdgpu 0000:03:00.0: PM: failed to resume async: error -62
    ```
    
    To avoid this series of unfortunate events, fail amdgpu's suspend
    when the memory eviction fails.  This will let the system gracefully
    recover and the user can try suspend again when the memory pressure
    is relieved.
    
    Reported-by: post@xxxxxxxxxx
    Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2223
    Signed-off-by: Mario Limonciello <mario.limonciello@xxxxxxx>
    Reviewed-by: Alex Deucher <alexander.deucher@xxxxxxx>
    Acked-by: Christian König <christian.koenig@xxxxxxx>
    Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 9170aeaad93e..e0c960cc1d2e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4055,15 +4055,18 @@ void amdgpu_device_fini_sw(struct amdgpu_device *adev)
  * at suspend time.
  *
  */
-static void amdgpu_device_evict_resources(struct amdgpu_device *adev)
+static int amdgpu_device_evict_resources(struct amdgpu_device *adev)
 {
+	int ret;
+
 	/* No need to evict vram on APUs for suspend to ram or s2idle */
 	if ((adev->in_s3 || adev->in_s0ix) && (adev->flags & AMD_IS_APU))
-		return;
+		return 0;
 
-	if (amdgpu_ttm_evict_resources(adev, TTM_PL_VRAM))
+	ret = amdgpu_ttm_evict_resources(adev, TTM_PL_VRAM);
+	if (ret)
 		DRM_WARN("evicting device resources failed\n");
-
+	return ret;
 }
 
 /*
@@ -4113,7 +4116,9 @@ int amdgpu_device_suspend(struct drm_device *dev, bool fbcon)
 	if (!adev->in_s0ix)
 		amdgpu_amdkfd_suspend(adev, adev->in_runpm);
 
-	amdgpu_device_evict_resources(adev);
+	r = amdgpu_device_evict_resources(adev);
+	if (r)
+		return r;
 
 	amdgpu_fence_driver_hw_fini(adev);
 



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux