Hi Christian , I just wonder when encounter ENOMEM error during pin amdgpu BOs can we retry validate again as below. With the following simply patch the Abaqus pinned issue not observed.
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 11cbf63..72a32f5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -902,11 +902,15 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 domain, bo->placements[i].lpfn = lpfn; bo->placements[i].flags |= TTM_PL_FLAG_NO_EVICT; } - +retry: r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx); if (unlikely(r)) { - dev_err(adev->dev, "%p pin failed\n", bo); - goto error; + if (r == -ENOMEM){ + goto retry; + } else { + dev_err(adev->dev, "%p pin failed\n", bo); + goto error; + } } bo->pin_count = 1; Thanks, Prike From: Marek Olšák <maraeo@xxxxxxxxx> [CAUTION: External Email] This series fixes the OOM errors. However, if I torture the kernel driver more, I can get it to deadlock and end up with unkillable processes. I can also get an OOM error. I just ran the test 5 times: AMD_DEBUG=testgdsmm glxgears & AMD_DEBUG=testgdsmm glxgears & AMD_DEBUG=testgdsmm glxgears & AMD_DEBUG=testgdsmm glxgears & AMD_DEBUG=testgdsmm glxgears Marek On Tue, May 14, 2019 at 8:31 AM Christian König <ckoenig.leichtzumerken@xxxxxxxxx> wrote:
|
_______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx