On 2/13/19 2:10 PM, Grodzovsky, Andrey wrote: > > On 2/13/19 2:00 PM, Kazlauskas, Nicholas wrote: >> On 2/13/19 1:58 PM, Andrey Grodzovsky wrote: >>> When ring hang happens amdgpu_dm_commit_planes during flip is holding >>> the BO reserved and then stack waiting for fences to signal in >>> reservation_object_wait_timeout_rcu (which won't signal because there >>> was a hnag). Then when we try to shutdown display block during reset >>> recovery from drm_atomic_helper_suspend we also try to reserve the BO >>> from dm_plane_helper_cleanup_fb ending in deadlock. >>> Also remove useless WARN_ON >>> >>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx> >>> --- >>> drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 19 +++++++++++++------ >>> 1 file changed, 13 insertions(+), 6 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c >>> index acc4ff8..f8dec36 100644 >>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c >>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c >>> @@ -4802,14 +4802,21 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state, >>> */ >>> abo = gem_to_amdgpu_bo(fb->obj[0]); >>> r = amdgpu_bo_reserve(abo, true); >>> - if (unlikely(r != 0)) { >>> + if (unlikely(r != 0)) >>> DRM_ERROR("failed to reserve buffer before flip\n"); >>> - WARN_ON(1); >>> - } >>> >>> - /* Wait for all fences on this FB */ >>> - WARN_ON(reservation_object_wait_timeout_rcu(abo->tbo.resv, true, false, >>> - MAX_SCHEDULE_TIMEOUT) < 0); >>> + /* >>> + * Wait for all fences on this FB. Do limited wait to avoid >>> + * deadlock during GPU reset when this fence will not signal >>> + * but we hold reservation lock for the BO. >>> + */ >>> + r = reservation_object_wait_timeout_rcu(abo->tbo.resv, >>> + true, false, >>> + msecs_to_jiffies(5000)); >>> + if (unlikely(r == 0)) >>> + DRM_ERROR("Waiting for fences timed out."); >>> + >>> + >>> >>> amdgpu_bo_get_tiling_flags(abo, &tiling_flags); >>> >>> >> Is it safe that we're just continuing like this? It's probably better to >> just unreserve the buffer and continue to the next plane, no? >> >> Nicholas Kazlauskas > > As far as I see it should be safe as you are simply flipping to a buffer > for which rendering hasn't finished (or stack actually in this case) so > you might see visual corruption but that the least of your problems if > after 5s the BO still not finalized for presentation, the system is > already probably in very bad shape. Also, in case we do want to do > error handling we should also take care of amdgpu_bo_reserve failure > just before that. > > Andrey > > Yeah, I guess this whole blocks needs to be cleaned up in that case. This is a good first step at least. Technically reservation_object_wait_timeout_rcu will return < 0 when it's been interrupted too as an error code but I guess that will just be silently ignored here. If you want you can change the condition to: if (unlikely(r >= 0)) DRM_ERROR("Waiting for FB fence failed: id=%d res=%d\n", plane->id, r); But with or without that change this patch is: Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@xxxxxxx> _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx