On 2/14/19 10:54 AM, Kazlauskas, Nicholas wrote: > On 2/14/19 10:42 AM, Grodzovsky, Andrey wrote: >> On 2/14/19 4:05 AM, Christian König wrote: >>> Am 13.02.19 um 19:58 schrieb Andrey Grodzovsky: >>>> When ring hang happens amdgpu_dm_commit_planes during flip is holding >>>> the BO reserved and then stack waiting for fences to signal in >>>> reservation_object_wait_timeout_rcu (which won't signal because there >>>> was a hnag). Then when we try to shutdown display block during reset >>>> recovery from drm_atomic_helper_suspend we also try to reserve the BO >>>> from dm_plane_helper_cleanup_fb ending in deadlock. >>>> Also remove useless WARN_ON >>> Well it is good that you pointed this out, but there are more problems >>> than just waiting wile the BO is reserved here. >>> >>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx> >>>> --- >>>> drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 19 >>>> +++++++++++++------ >>>> 1 file changed, 13 insertions(+), 6 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c >>>> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c >>>> index acc4ff8..f8dec36 100644 >>>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c >>>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c >>>> @@ -4802,14 +4802,21 @@ static void amdgpu_dm_commit_planes(struct >>>> drm_atomic_state *state, >>>> */ >>>> abo = gem_to_amdgpu_bo(fb->obj[0]); >>>> r = amdgpu_bo_reserve(abo, true); >>> Well why do we reserve the BO in the first place? As the name >>> indicates reservation_object_wait_timeout_rcu() just uses rcu to wait >>> for the BO to be idle, no need to actually reserve it at all. >> That a good point, I honestly can't remember any more why it's here... >> It does look unnecessary given that we are not validating the BO here. > I think we reserve it to grab the tiling flags to update the plane > address (and some other bits of the plane state with a patch I'm > currently working on). I'm guessing that we still need the reservation > so nothing else tries to grab it after it becomes idle, but I'm not > overly familiar with that bit. > > Nicholas Kazlauskas I don't think so, first of all we only hold the reservation during the wait for fences and then release it, secondly, BO reservation is a thin wrapper around ttm_bo_reserve which just protects the ttm BO internal structures and removes the BO from eviction list. I think you can try to remove it and run some stress testing to see there is no visible corruption. Andrey > >> >>>> - if (unlikely(r != 0)) { >>>> + if (unlikely(r != 0)) >>>> DRM_ERROR("failed to reserve buffer before flip\n"); >>>> - WARN_ON(1); >>>> - } >>>> - /* Wait for all fences on this FB */ >>>> - WARN_ON(reservation_object_wait_timeout_rcu(abo->tbo.resv, true, >>>> false, >>>> - MAX_SCHEDULE_TIMEOUT) < 0); >>>> + /* >>>> + * Wait for all fences on this FB. Do limited wait to avoid >>>> + * deadlock during GPU reset when this fence will not >>>> signal >>>> + * but we hold reservation lock for the BO. >>>> + */ >>>> + r = reservation_object_wait_timeout_rcu(abo->tbo.resv, >>>> + true, false, >>> Does this waiting happen in a work item or process context? If it's >>> process context we should actually try to wait interruptible here. >> >> Work item only. >> >> Andrey >> >> >>> Regards, >>> Christian. >>> >>>> + msecs_to_jiffies(5000)); >>>> + if (unlikely(r == 0)) >>>> + DRM_ERROR("Waiting for fences timed out."); >>>> + >>>> + >>>> amdgpu_bo_get_tiling_flags(abo, &tiling_flags); _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx