Re: amd/amdkfd: Fix a memory limit issue

"Limonciello, Mario" <mario.limonciello@xxxxxxx> · Mon, 14 Nov 2022 14:58:35 -0600

On 11/14/2022 12:45, Eric Huang wrote:
It is to resolve a regression, which fails to allocate
VRAM due to no free memory in application, the reason
is we add check of vram_pin_size for memory limit, and
application is pinning the memory for Peerdirect, KFD
should not count it in memory limit. So removing
vram_pin_size will resolve it.

Any idea when the regression was introduced?  Could you narrow it down 
to a commit?

If so, it would be great to include a "Fixes" tag so that this could 
also backport to relevant stable kernels that have the regression.


Signed-off-by: Eric Huang <jinhuieric.huang@xxxxxxx>
Reviewed-by: Felix Kuehling <Felix.Kuehling@xxxxxxx>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 4 +---
  1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index db772942f7a6..fb1bb593312e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -172,9 +172,7 @@ int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
  	    (kfd_mem_limit.ttm_mem_used + ttm_mem_needed >
  	     kfd_mem_limit.max_ttm_mem_limit) ||
  	    (adev && adev->kfd.vram_used + vram_needed >
-	     adev->gmc.real_vram_size -
-	     atomic64_read(&adev->vram_pin_size) -
-	     reserved_for_pt)) {
+	     adev->gmc.real_vram_size - reserved_for_pt)) {
  		ret = -ENOMEM;
  		goto release;
  	}