Sure, that probably would be the solution, one missing detail here (besides confirming with the debug prints that this is the scenario we are hitting) is WHY we even stuck in reservation_object_wait_timeout_rcu, in amdgpu_device_pre_asic_reset (during GPU reset) we are first forcing all outstanding HW fences completion through amdgpu_fence_driver_force_completion BEFORE proceeding to ip blocks suspend in amdgpu_device_ip_suspend. One possible explanation would be that the fence attached to the BO is a scheduler fence (SW fence) and not the backing HW fence, I will be able to verify this with some fence traces after confirming that the deadlock indeed is the one I described. Andrey On 2/12/19 1:29 PM, Kazlauskas, Nicholas wrote: > The MAX_SCHEDULE_TIMEOUT is probably not a good idea on the wait in DM. > > I wonder if we could just do shorter wait and skip the FB > update/programming if it fails after some reasonable amount of time. > > This would still allow recovery to happen at least even if the display > isn't showing the right buffer. > > Nicholas Kazlauskas > > On 2/12/19 12:46 PM, Grodzovsky, Andrey wrote: >> I suspect the issue is that amdgpu_dm_do_flip is holding the BO reserved >> and then stack waiting for fences to signal in >> reservation_object_wait_timeout_rcu (which won't signal because there >> was a VM_FAULT). Then when we try to shutdown display block during reset >> recovery from drm_atomic_helper_suspend we also try to reserve the BO, >> probably from dm_plane_helper_cleanup_fb ending in deadlock. >> >> To confirm i am attaching some printks around the BO reservation - >> please apply and rerun. >> >> Also, probably a good idea to open FDO ticket on this instead of using >> amd-gfx. >> >> Andrey >> >> >> On 2/12/19 10:49 AM, Mikhail Gavrilov wrote: >>> On Tue, 12 Feb 2019 at 20:23, Grodzovsky, Andrey >>> <Andrey.Grodzovsky@xxxxxxx> wrote: >>>> It should recover you - so this looks like a bug. I noticed in one of >>>> the call traces this - drm_atomic_helper_suspend which points to system >>>> going into sleep mode, is it what happened, did it hang when system >>>> tried to sleep ? >>>> >>> It's weird because the computer was not enter in sleep mode. I am sure. >>> Steps for reproduce: >>> 1. Launch Shadow of The tomb Rider on Proton2. Wait some time until mouse stop respond >>> 3. Dump gfx, waves and all other dumps including dmesg >>> >>> And of course the power button (button which enter in sleep mode) was >>> not pressed. >>> >>> So the new dumps has any new useful info? Or they are pointless? >>> -- >>> Best Regards, >>> Mike Gavrilov. >>> >>> _______________________________________________ >>> amd-gfx mailing list >>> amd-gfx@xxxxxxxxxxxxxxxxxxxxx >>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx