Re: [PATCH] drm/amdkfd: Don't over commit vram while xnack is off

Felix Kuehling <felix.kuehling@xxxxxxx> · Thu, 20 Feb 2025 09:27:09 -0500

On 2025-02-20 7:00, Emily Deng wrote:
> For xnack is off, the application should ensure the vram not overcommit.

This is incorrect. SVM ranges in VRAM can always be evicted to system memory even with XNACK off. During the migration the user mode queues are stopped by the MMU notifier. We apply system memory limits for SVM to ensure that all SVM ranges can fit into system memory. VRAM is used opportunistically while it's available.

A VRAM allocation for non-SVM should be able to evict SVM ranges from the same process. This does not stop forward progress of the application because the application can continue after the migration with the data in system memory. This patch breaks that.

Regards,
  Felix

>
> Signed-off-by: Emily Deng <Emily.Deng@xxxxxxx>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
> index 1ef758ac5076..1aad27994452 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
> @@ -171,11 +171,17 @@ static void amdkfd_fence_release(struct dma_fence *f)
>  bool amdkfd_fence_check_mm(struct dma_fence *f, struct mm_struct *mm)
>  {
>  	struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
> +	struct kfd_process *p;
>  
>  	if (!fence)
>  		return false;
>  	else if (fence->mm == mm  && !fence->svm_bo)
>  		return true;
> +	else if (fence->svm_bo) {
> +		p = kfd_lookup_process_by_mm(mm);
> +		if (p && !p->xnack_enabled)
> +			return true;
> +	}
>  
>  	return false;
>  }