On Fri, Jun 30, 2017 at 5:44 PM, Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> wrote: >> And if the GEM folks insist the old behavior can't be restored, then we >> just need a tailor-made get-out-of-jail card for gen4 gpu reset somewhere >> in i915_sw_fence. Force-completing all render requests atomic updates >> depend upon is imo the simplest solution to this, and we've had a driver >> that worked like that for years. > > And it used to break all the time. I think we've had to fix it at least > three times by now. So I tend to think it's better to fix it in a way > that won't break so easily. Why exactly is making the atomic code massive more tricky the easy fix? That's the part I don't get. Yes it got broken a bunch because no one runs CI and everyone forgets that gen3/4 reset the display in gpu reset, but in the end we do have a depency loop, and either the modeset side or the render side needs to bail out and cancel it's async stuff (whether that's a request or a nonblocking flip/atomic commit doesn't matter). In my opinion, cancelling the request (even if we're clever and only cancel the requests for the modeset waiters, which isn't to hard to pull off) seems about the simplest option. Especially since we need that code anyway, even TDR can't safe everything and resubmit under all circumstances (at least the buggy batch can't be resubmitted). Cancelling any kind of atomic commit otoh looks like a lot more complexity. Why do you think this is the easier, or at least less fragile option? This patch series is full of FIXMEs, and even the more complete set seems to have a pile of holes. Plus we need to stop using obj->state, and I don't see any easy way to test for that (since the gen3/4 gpu reset case is the only corner cases that currently needs that). So not seeing how this is easier or more robust at all. What do I miss? Thanks, Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch