On Fri, Jun 30, 2017 at 5:39 PM, Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> wrote: >> Yeah, but my point is that this here is an extremely fancy and fragile >> (and afaics in this form, incomplete) fix for something that in the past >> was fixed much, much simpler. Why do we need this massive amount of >> complexity now? Who's asking for all this (we don't even have anyone yet >> asking for fully queued atomic commits, much less on gen4)? > > It was never "fixed", it was always borked; broken by design. Hm, what was broken by design in gen3/4 reset? We never bothered to resubmit rendering when the gpu died, but besides that I'm not aware of a deisgn issue in that logic. We nuked in-flight pageflips (and restored those), and we stalled for any pending modesets (grabbing locks did that since all modesets where blocking), and that made sure the hw was in a consistent state. We always leaked the vblank state to userspace, but this approach here also doesn't fix this. Plus broken rendering, but for these old platforms I'm not too worried about displaying a few wrong frames (with the new reset we will resubmit, so proper rendering should show up soonish) - after all gpu reset nukes the entire display, there's no way for the user to not notice that. It would be neat to not have to do that, and Ville has a plan, but meanwhile we still have this regression at hand that seems to be the blocker for adding more machines to CI. I'd like to have the least complex path to get that address (but maybe not long-term fixed, I'm clear on that). If feasible. If that's a unicorn, then let's go with Ville's approach, but then I think we need the full thing with the races properly closed. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch