Patch "drm/amdkfd: Fix lock dependency warning with srcu" has been added to the 6.6-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    drm/amdkfd: Fix lock dependency warning with srcu

to the 6.6-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     drm-amdkfd-fix-lock-dependency-warning-with-srcu.patch
and it can be found in the queue-6.6 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit bd07bcb2946d8a01e3d500693764a6fa8754f410
Author: Philip Yang <Philip.Yang@xxxxxxx>
Date:   Fri Dec 29 15:19:25 2023 -0500

    drm/amdkfd: Fix lock dependency warning with srcu
    
    [ Upstream commit 2a9de42e8d3c82c6990d226198602be44f43f340 ]
    
    ======================================================
    WARNING: possible circular locking dependency detected
    6.5.0-kfd-yangp #2289 Not tainted
    ------------------------------------------------------
    kworker/0:2/996 is trying to acquire lock:
            (srcu){.+.+}-{0:0}, at: __synchronize_srcu+0x5/0x1a0
    
    but task is already holding lock:
            ((work_completion)(&svms->deferred_list_work)){+.+.}-{0:0}, at:
            process_one_work+0x211/0x560
    
    which lock already depends on the new lock.
    
    the existing dependency chain (in reverse order) is:
    
    -> #3 ((work_completion)(&svms->deferred_list_work)){+.+.}-{0:0}:
            __flush_work+0x88/0x4f0
            svm_range_list_lock_and_flush_work+0x3d/0x110 [amdgpu]
            svm_range_set_attr+0xd6/0x14c0 [amdgpu]
            kfd_ioctl+0x1d1/0x630 [amdgpu]
            __x64_sys_ioctl+0x88/0xc0
    
    -> #2 (&info->lock#2){+.+.}-{3:3}:
            __mutex_lock+0x99/0xc70
            amdgpu_amdkfd_gpuvm_restore_process_bos+0x54/0x740 [amdgpu]
            restore_process_helper+0x22/0x80 [amdgpu]
            restore_process_worker+0x2d/0xa0 [amdgpu]
            process_one_work+0x29b/0x560
            worker_thread+0x3d/0x3d0
    
    -> #1 ((work_completion)(&(&process->restore_work)->work)){+.+.}-{0:0}:
            __flush_work+0x88/0x4f0
            __cancel_work_timer+0x12c/0x1c0
            kfd_process_notifier_release_internal+0x37/0x1f0 [amdgpu]
            __mmu_notifier_release+0xad/0x240
            exit_mmap+0x6a/0x3a0
            mmput+0x6a/0x120
            do_exit+0x322/0xb90
            do_group_exit+0x37/0xa0
            __x64_sys_exit_group+0x18/0x20
            do_syscall_64+0x38/0x80
    
    -> #0 (srcu){.+.+}-{0:0}:
            __lock_acquire+0x1521/0x2510
            lock_sync+0x5f/0x90
            __synchronize_srcu+0x4f/0x1a0
            __mmu_notifier_release+0x128/0x240
            exit_mmap+0x6a/0x3a0
            mmput+0x6a/0x120
            svm_range_deferred_list_work+0x19f/0x350 [amdgpu]
            process_one_work+0x29b/0x560
            worker_thread+0x3d/0x3d0
    
    other info that might help us debug this:
    Chain exists of:
      srcu --> &info->lock#2 --> (work_completion)(&svms->deferred_list_work)
    
    Possible unsafe locking scenario:
    
            CPU0                    CPU1
            ----                    ----
            lock((work_completion)(&svms->deferred_list_work));
                            lock(&info->lock#2);
                            lock((work_completion)(&svms->deferred_list_work));
            sync(srcu);
    
    Signed-off-by: Philip Yang <Philip.Yang@xxxxxxx>
    Reviewed-by: Felix Kuehling <felix.kuehling@xxxxxxx>
    Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index a4c911fa1675..b51224a85a38 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -2343,8 +2343,10 @@ static void svm_range_deferred_list_work(struct work_struct *work)
 		mutex_unlock(&svms->lock);
 		mmap_write_unlock(mm);
 
-		/* Pairs with mmget in svm_range_add_list_work */
-		mmput(mm);
+		/* Pairs with mmget in svm_range_add_list_work. If dropping the
+		 * last mm refcount, schedule release work to avoid circular locking
+		 */
+		mmput_async(mm);
 
 		spin_lock(&svms->deferred_list_lock);
 	}




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux