On Mon, Aug 04, 2014 at 07:04:46PM +0200, Christian König wrote: > Am 04.08.2014 um 17:09 schrieb Maarten Lankhorst: > >op 04-08-14 17:04, Christian König schreef: > >>Am 04.08.2014 um 16:58 schrieb Maarten Lankhorst: > >>>op 04-08-14 16:45, Christian König schreef: > >>>>Am 04.08.2014 um 16:40 schrieb Maarten Lankhorst: > >>>>>op 04-08-14 16:37, Christian König schreef: > >>>>>>>It'a pain to deal with gpu reset. > >>>>>>Yeah, well that's nothing new. > >>>>>> > >>>>>>>I've now tried other solutions but that would mean reverting to the old style during gpu lockup recovery, and only running the delayed work when !lockup. > >>>>>>>But this meant that the timeout was useless to add. I think the cleanest is keeping the v2 patch, because potentially any waiting code can be called during lockup recovery. > >>>>>>The lockup code itself should never call any waiting code and V2 doesn't seem to handle a couple of cases correctly either. > >>>>>> > >>>>>>How about moving the fence waiting out of the reset code? > >>>>>What cases did I miss then? > >>>>> > >>>>>I'm curious how you want to move the fence waiting out of reset, when there are so many places that could potentially wait, like radeon_ib_get can call radeon_sa_bo_new which can do a wait, or radeon_ring_alloc that can wait on radeon_fence_wait_next, etc. > >>>>The IB test itself doesn't needs to be protected by the exclusive lock. Only everything between radeon_save_bios_scratch_regs and radeon_ring_restore. > >>>I'm not sure about that, what do you want to do if the ring tests fail? Do you have to retake the exclusive lock? > >>Just set need_reset again and return -EAGAIN, that should have mostly the same effect as what we are doing right now. > >Yeah, except for the locking the ttm delayed workqueue, but that bool should be easy to save/restore. > >I think this could work. > > Actually you could activate the delayed workqueue much earlier as well. > > Thinking more about it that sounds like a bug in the current code, because > we probably want the workqueue activated before waiting for the fence. We've actually had a similar issue on i915 where when userspace never waited for rendering (some shitty userspace drivers did that way back) we never noticed that the gpu died. So launching the hangcheck/stuck wait worker (we have both too) right away is what we do now. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel