Patch "drm/amdkfd: rm BO resv on validation to avoid deadlock" has been added to the 5.14-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    drm/amdkfd: rm BO resv on validation to avoid deadlock

to the 5.14-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     drm-amdkfd-rm-bo-resv-on-validation-to-avoid-deadloc.patch
and it can be found in the queue-5.14 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 2e891119f6d50fabc3cea8bad756587e9c7d2a46
Author: Alex Sierra <alex.sierra@xxxxxxx>
Date:   Thu Oct 7 12:04:09 2021 -0500

    drm/amdkfd: rm BO resv on validation to avoid deadlock
    
    [ Upstream commit ec6abe831a843208e99a59adf108adba22166b3f ]
    
    This fix the deadlock with the BO reservations during SVM_BO evictions
    while allocations in VRAM are concurrently performed. More specific,
    while the ttm waits for the fence to be signaled (ttm_bo_wait), it
    already has the BO reserved. In parallel, the restore worker might be
    running, prefetching memory to VRAM. This also requires to reserve the
    BO, but blocks the mmap semaphore first. The deadlock happens when the
    SVM_BO eviction worker kicks in and waits for the mmap semaphore held
    in restore worker. Preventing signal the fence back, causing the
    deadlock until the ttm times out.
    
    We don't need to hold the BO reservation anymore during validation
    and mapping. Now the physical addresses are taken from hmm_range_fault.
    We also take migrate_mutex to prevent range migration while
    validate_and_map update GPU page table.
    
    Signed-off-by: Alex Sierra <alex.sierra@xxxxxxx>
    Signed-off-by: Felix Kuehling <Felix.Kuehling@xxxxxxx>
    Reviewed-by: Philip Yang <philip.yang@xxxxxxx>
    Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index e85035fd1ccb4..b6a19ac2bc607 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1303,7 +1303,7 @@ struct svm_validate_context {
 	struct svm_range *prange;
 	bool intr;
 	unsigned long bitmap[MAX_GPU_INSTANCE];
-	struct ttm_validate_buffer tv[MAX_GPU_INSTANCE+1];
+	struct ttm_validate_buffer tv[MAX_GPU_INSTANCE];
 	struct list_head validate_list;
 	struct ww_acquire_ctx ticket;
 };
@@ -1330,11 +1330,6 @@ static int svm_range_reserve_bos(struct svm_validate_context *ctx)
 		ctx->tv[gpuidx].num_shared = 4;
 		list_add(&ctx->tv[gpuidx].head, &ctx->validate_list);
 	}
-	if (ctx->prange->svm_bo && ctx->prange->ttm_res) {
-		ctx->tv[MAX_GPU_INSTANCE].bo = &ctx->prange->svm_bo->bo->tbo;
-		ctx->tv[MAX_GPU_INSTANCE].num_shared = 1;
-		list_add(&ctx->tv[MAX_GPU_INSTANCE].head, &ctx->validate_list);
-	}
 
 	r = ttm_eu_reserve_buffers(&ctx->ticket, &ctx->validate_list,
 				   ctx->intr, NULL);



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux