Patch "drm/amdgpu: fix kernel page fault issue by ras recovery on sGPU" has been added to the 5.7-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    drm/amdgpu: fix kernel page fault issue by ras recovery on sGPU

to the 5.7-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     drm-amdgpu-fix-kernel-page-fault-issue-by-ras-recove.patch
and it can be found in the queue-5.7 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 5a211368384eb937b650d165da935777996f9977
Author: Guchun Chen <guchun.chen@xxxxxxx>
Date:   Thu Apr 16 23:41:07 2020 +0800

    drm/amdgpu: fix kernel page fault issue by ras recovery on sGPU
    
    [ Upstream commit 12c17b9d62663c14a5343d6742682b3e67280754 ]
    
    When running ras uncorrectable error injection and triggering GPU
    reset on sGPU, below issue is observed. It's caused by the list
    uninitialized when accessing.
    
    [   80.047227] BUG: unable to handle page fault for address: ffffffffc0f4f750
    [   80.047300] #PF: supervisor write access in kernel mode
    [   80.047351] #PF: error_code(0x0003) - permissions violation
    [   80.047404] PGD 12c20e067 P4D 12c20e067 PUD 12c210067 PMD 41c4ee067 PTE 404316061
    [   80.047477] Oops: 0003 [#1] SMP PTI
    [   80.047516] CPU: 7 PID: 377 Comm: kworker/7:2 Tainted: G           OE     5.4.0-rc7-guchchen #1
    [   80.047594] Hardware name: System manufacturer System Product Name/TUF Z370-PLUS GAMING II, BIOS 0411 09/21/2018
    [   80.047888] Workqueue: events amdgpu_ras_do_recovery [amdgpu]
    
    Signed-off-by: Guchun Chen <guchun.chen@xxxxxxx>
    Reviewed-by: John Clements <John.Clements@xxxxxxx>
    Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index b0aa4e1ed4df7..cd18596b47d33 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -1444,9 +1444,10 @@ static void amdgpu_ras_do_recovery(struct work_struct *work)
 	struct amdgpu_hive_info *hive = amdgpu_get_xgmi_hive(adev, false);
 
 	/* Build list of devices to query RAS related errors */
-	if  (hive && adev->gmc.xgmi.num_physical_nodes > 1) {
+	if  (hive && adev->gmc.xgmi.num_physical_nodes > 1)
 		device_list_handle = &hive->device_list;
-	} else {
+	else {
+		INIT_LIST_HEAD(&device_list);
 		list_add_tail(&adev->gmc.xgmi.head, &device_list);
 		device_list_handle = &device_list;
 	}



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux