TTM allocate contiguous VRAM may takes more than 1 second to evict BOs for larger size RDMA buffer. Because KFD restore bo worker reserves all KFD BOs, then TTM cannot hold the remainning KFD BOs lock to evict them, this may causes TTM failed to alloc contiguous VRAM. Increase the KFD restore BO wait time to 2 seconds, long enough for RDMA pin BO to finish the contiguous VRAM allocation. Signed-off-by: Philip Yang <Philip.Yang@xxxxxxx> --- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h index a81ef232fdef..c205e2d3acf9 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h @@ -698,7 +698,7 @@ struct qcm_process_device { /* KFD Memory Eviction */ /* Approx. wait time before attempting to restore evicted BOs */ -#define PROCESS_RESTORE_TIME_MS 100 +#define PROCESS_RESTORE_TIME_MS 2000 /* Approx. back off time if restore fails due to lack of memory */ #define PROCESS_BACK_OFF_TIME_MS 100 /* Approx. time before evicting the process again */ -- 2.43.2