Patch "drm/amdkfd: avoid recursive lock in migrations back to RAM" has been added to the 5.15-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    drm/amdkfd: avoid recursive lock in migrations back to RAM

to the 5.15-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     drm-amdkfd-avoid-recursive-lock-in-migrations-back-t.patch
and it can be found in the queue-5.15 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 258c2743e53efda174f8ee4133ee4ff0e7b6ac2f
Author: Alex Sierra <alex.sierra@xxxxxxx>
Date:   Fri Oct 29 13:30:40 2021 -0500

    drm/amdkfd: avoid recursive lock in migrations back to RAM
    
    [ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
    
    [Why]:
    When we call hmm_range_fault to map memory after a migration, we don't
    expect memory to be migrated again as a result of hmm_range_fault. The
    driver ensures that all memory is in GPU-accessible locations so that
    no migration should be needed. However, there is one corner case where
    hmm_range_fault can unexpectedly cause a migration from DEVICE_PRIVATE
    back to system memory due to a write-fault when a system memory page in
    the same range was mapped read-only (e.g. COW). Ranges with individual
    pages in different locations are usually the result of failed page
    migrations (e.g. page lock contention). The unexpected migration back
    to system memory causes a deadlock from recursive locking in our
    driver.
    
    [How]:
    Creating a task reference new member under svm_range_list struct.
    Setting this with "current" reference, right before the hmm_range_fault
    is called. This member is checked against "current" reference at
    svm_migrate_to_ram callback function. If equal, the migration will be
    ignored.
    
    Signed-off-by: Alex Sierra <alex.sierra@xxxxxxx>
    Reviewed-by: Felix Kuehling <Felix.Kuehling@xxxxxxx>
    Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx>
    Stable-dep-of: 5b994354af3c ("drm/amdkfd: Fix NULL pointer dereference in svm_migrate_to_ram()")
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index 4a16e3c257b9..a458c19b371a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -796,6 +796,11 @@ static vm_fault_t svm_migrate_to_ram(struct vm_fault *vmf)
 		pr_debug("failed find process at fault address 0x%lx\n", addr);
 		return VM_FAULT_SIGBUS;
 	}
+	if (READ_ONCE(p->svms.faulting_task) == current) {
+		pr_debug("skipping ram migration\n");
+		kfd_unref_process(p);
+		return 0;
+	}
 	addr >>= PAGE_SHIFT;
 	pr_debug("CPU page fault svms 0x%p address 0x%lx\n", &p->svms, addr);
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 6d8f9bb2d905..47ec820cae72 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -755,6 +755,7 @@ struct svm_range_list {
 	atomic_t			evicted_ranges;
 	struct delayed_work		restore_work;
 	DECLARE_BITMAP(bitmap_supported, MAX_GPU_INSTANCE);
+	struct task_struct 		*faulting_task;
 };
 
 /* Process data */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 74e6f613be02..22a70aaccf13 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1489,9 +1489,11 @@ static int svm_range_validate_and_map(struct mm_struct *mm,
 
 		next = min(vma->vm_end, end);
 		npages = (next - addr) >> PAGE_SHIFT;
+		WRITE_ONCE(p->svms.faulting_task, current);
 		r = amdgpu_hmm_range_get_pages(&prange->notifier, mm, NULL,
 					       addr, npages, &hmm_range,
 					       readonly, true, owner);
+		WRITE_ONCE(p->svms.faulting_task, NULL);
 		if (r) {
 			pr_debug("failed %d to get svm range pages\n", r);
 			goto unreserve_out;



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux