Am 24.05.19 um 23:34 schrieb Kuehling, Felix: > On 2019-05-23 5:06 a.m., Christian König wrote: >> [CAUTION: External Email] >> >> Leaving BOs on the LRU is harmless. We always did this for VM page table >> and per VM BOs. >> >> The key point is that BOs which couldn't be reserved can't be evicted. >> So what happened is that an application used basically all of VRAM >> during CS and because of this X server couldn't pin a BO for scanout. >> >> Now we keep the BOs on the LRU and modify TTM to block for the CS to >> complete, which in turn allows the X server to pin its BO for scanout. > > OK, let me rephrase that to make sure I understand it correctly. I think > the point is that eviction candidates come from an LRU list, so leaving > things on the LRU makes more BOs available for eviction and avoids OOM > situations. To take advantage of that, patch 6 adds the ability to wait > for reserved BOs when there is nothing easier to evict. > > ROCm applications like to use lots of memory. So it probably makes sense > for us to stop removing our BOs from the LRU as well while we > mass-validate our BOs in amdgpu_amdkfd_gpuvm_restore_process_bos. Well that would allow concurrent calls of amdgpu_amdkfd_gpuvm_restore_process_bos() to wait for each other. If that's what you want then yeah that certainly makes sense. Regards, Christian. > > Regards, > Felix > > >> Christian. >> >> Am 22.05.19 um 21:43 schrieb Kuehling, Felix: >>> Can you explain how this avoids OOM situations? When is it safe to leave >>> a reserved BO on the LRU list? Could we do the same thing in >>> amdgpu_amdkfd_gpuvm.c? And if we did, what would be the expected side >>> effects or consequences? >>> >>> Thanks, >>> Felix >>> >>> On 2019-05-22 8:59 a.m., Christian König wrote: >>>> [CAUTION: External Email] >>>> >>>> This avoids OOM situations when we have lots of threads >>>> submitting at the same time. >>>> >>>> v3: apply this to the whole driver, not just CS >>>> >>>> Signed-off-by: Christian König <christian.koenig@xxxxxxx> >>>> --- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c | 2 +- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 4 ++-- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 2 +- >>>> 4 files changed, 5 insertions(+), 5 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c >>>> index 20f2955d2a55..3e2da24cd17a 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c >>>> @@ -648,7 +648,7 @@ static int amdgpu_cs_parser_bos(struct >>>> amdgpu_cs_parser *p, >>>> } >>>> >>>> r = ttm_eu_reserve_buffers(&p->ticket, &p->validated, true, >>>> - &duplicates, true); >>>> + &duplicates, false); >>>> if (unlikely(r != 0)) { >>>> if (r != -ERESTARTSYS) >>>> DRM_ERROR("ttm_eu_reserve_buffers >>>> failed.\n"); >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c >>>> index 06f83cac0d3a..f660628e6af9 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c >>>> @@ -79,7 +79,7 @@ int amdgpu_map_static_csa(struct amdgpu_device >>>> *adev, struct amdgpu_vm *vm, >>>> list_add(&csa_tv.head, &list); >>>> amdgpu_vm_get_pd_bo(vm, &list, &pd); >>>> >>>> - r = ttm_eu_reserve_buffers(&ticket, &list, true, NULL, true); >>>> + r = ttm_eu_reserve_buffers(&ticket, &list, true, NULL, false); >>>> if (r) { >>>> DRM_ERROR("failed to reserve CSA,PD BOs: >>>> err=%d\n", r); >>>> return r; >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c >>>> index d513a5ad03dd..ed25a4e14404 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c >>>> @@ -171,7 +171,7 @@ void amdgpu_gem_object_close(struct >>>> drm_gem_object *obj, >>>> >>>> amdgpu_vm_get_pd_bo(vm, &list, &vm_pd); >>>> >>>> - r = ttm_eu_reserve_buffers(&ticket, &list, false, >>>> &duplicates, true); >>>> + r = ttm_eu_reserve_buffers(&ticket, &list, false, >>>> &duplicates, false); >>>> if (r) { >>>> dev_err(adev->dev, "leaking bo va because " >>>> "we fail to reserve bo (%d)\n", r); >>>> @@ -608,7 +608,7 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, >>>> void *data, >>>> >>>> amdgpu_vm_get_pd_bo(&fpriv->vm, &list, &vm_pd); >>>> >>>> - r = ttm_eu_reserve_buffers(&ticket, &list, true, >>>> &duplicates, true); >>>> + r = ttm_eu_reserve_buffers(&ticket, &list, true, >>>> &duplicates, false); >>>> if (r) >>>> goto error_unref; >>>> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h >>>> index c430e8259038..d60593cc436e 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h >>>> @@ -155,7 +155,7 @@ static inline int amdgpu_bo_reserve(struct >>>> amdgpu_bo *bo, bool no_intr) >>>> struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev); >>>> int r; >>>> >>>> - r = ttm_bo_reserve(&bo->tbo, !no_intr, false, NULL); >>>> + r = __ttm_bo_reserve(&bo->tbo, !no_intr, false, NULL); >>>> if (unlikely(r != 0)) { >>>> if (r != -ERESTARTSYS) >>>> dev_err(adev->dev, "%p reserve failed\n", >>>> bo); >>>> -- >>>> 2.17.1 >>>> >>>> _______________________________________________ >>>> amd-gfx mailing list >>>> amd-gfx@xxxxxxxxxxxxxxxxxxxxx >>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx