[AMD Official Use Only - General] Oh yep, Pinned BO is moved to other LRU list, So eviction fails because of other reason. I will change the comments in the patch. The problem is eviction fails as many reasons, say, BO is locked. ASAIK, kfd will stop the queues and flush some evict/restore work in its suspend callback. SO the first eviction before kfd callback likely fails. -----Original Message----- From: Christian König <ckoenig.leichtzumerken@xxxxxxxxx> Sent: Friday, September 8, 2023 2:49 PM To: Pan, Xinhui <Xinhui.Pan@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Koenig, Christian <Christian.Koenig@xxxxxxx>; Fan, Shikang <Shikang.Fan@xxxxxxx> Subject: Re: [PATCH] drm/amdgpu: Ignore first evction failure during suspend Am 08.09.23 um 05:39 schrieb xinhui pan: > Some BOs might be pinned. So the first eviction's failure will abort > the suspend sequence. These pinned BOs will be unpined afterwards > during suspend. That doesn't make much sense since pinned BOs don't cause eviction failure here. What exactly is the error code you see? Christian. > > Actaully it has evicted most BOs, so that should stil work fine in > sriov full access mode. > > Fixes: 47ea20762bb7 ("drm/amdgpu: Add an extra evict_resource call > during device_suspend.") > Signed-off-by: xinhui pan <xinhui.pan@xxxxxxx> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 9 +++++---- > 1 file changed, 5 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index 5c0e2b766026..39af526cdbbe 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -4148,10 +4148,11 @@ int amdgpu_device_suspend(struct drm_device > *dev, bool fbcon) > > adev->in_suspend = true; > > - /* Evict the majority of BOs before grabbing the full access */ > - r = amdgpu_device_evict_resources(adev); > - if (r) > - return r; > + /* Try to evict the majority of BOs before grabbing the full access > + * Ignore the ret val at first place as we will unpin some BOs if any > + * afterwards. > + */ > + (void)amdgpu_device_evict_resources(adev); > > if (amdgpu_sriov_vf(adev)) { > amdgpu_virt_fini_data_exchange(adev);