On 2/14/19 4:05 AM, Christian König wrote: > Am 13.02.19 um 19:58 schrieb Andrey Grodzovsky: >> When ring hang happens amdgpu_dm_commit_planes during flip is holding >> the BO reserved and then stack waiting for fences to signal in >> reservation_object_wait_timeout_rcu (which won't signal because there >> was a hnag). Then when we try to shutdown display block during reset >> recovery from drm_atomic_helper_suspend we also try to reserve the BO >> from dm_plane_helper_cleanup_fb ending in deadlock. >> Also remove useless WARN_ON > > Well it is good that you pointed this out, but there are more problems > than just waiting wile the BO is reserved here. > >> >> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx> >> --- >> drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 19 >> +++++++++++++------ >> 1 file changed, 13 insertions(+), 6 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c >> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c >> index acc4ff8..f8dec36 100644 >> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c >> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c >> @@ -4802,14 +4802,21 @@ static void amdgpu_dm_commit_planes(struct >> drm_atomic_state *state, >> */ >> abo = gem_to_amdgpu_bo(fb->obj[0]); >> r = amdgpu_bo_reserve(abo, true); > > Well why do we reserve the BO in the first place? As the name > indicates reservation_object_wait_timeout_rcu() just uses rcu to wait > for the BO to be idle, no need to actually reserve it at all. That a good point, I honestly can't remember any more why it's here... It does look unnecessary given that we are not validating the BO here. > >> - if (unlikely(r != 0)) { >> + if (unlikely(r != 0)) >> DRM_ERROR("failed to reserve buffer before flip\n"); >> - WARN_ON(1); >> - } >> - /* Wait for all fences on this FB */ >> - WARN_ON(reservation_object_wait_timeout_rcu(abo->tbo.resv, true, >> false, >> - MAX_SCHEDULE_TIMEOUT) < 0); >> + /* >> + * Wait for all fences on this FB. Do limited wait to avoid >> + * deadlock during GPU reset when this fence will not >> signal >> + * but we hold reservation lock for the BO. >> + */ >> + r = reservation_object_wait_timeout_rcu(abo->tbo.resv, >> + true, false, > > Does this waiting happen in a work item or process context? If it's > process context we should actually try to wait interruptible here. Work item only. Andrey > > Regards, > Christian. > >> + msecs_to_jiffies(5000)); >> + if (unlikely(r == 0)) >> + DRM_ERROR("Waiting for fences timed out."); >> + >> + >> amdgpu_bo_get_tiling_flags(abo, &tiling_flags); > _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx