Re: [PATCH v4 5/5] drm/i915: Solve the GPU reset vs. modeset deadlocks with an rw_semaphore

Daniel Vetter <daniel@xxxxxxxx> · Mon, 3 Jul 2017 10:03:36 +0200

On Fri, Jun 30, 2017 at 5:39 PM, Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> wrote:
>> Yeah, but my point is that this here is an extremely fancy and fragile
>> (and afaics in this form, incomplete) fix for something that in the past
>> was fixed much, much simpler. Why do we need this massive amount of
>> complexity now? Who's asking for all this (we don't even have anyone yet
>> asking for fully queued atomic commits, much less on gen4)?
>
> It was never "fixed", it was always borked; broken by design.

Hm, what was broken by design in gen3/4 reset? We never bothered to
resubmit rendering when the gpu died, but besides that I'm not aware
of a deisgn issue in that logic. We nuked in-flight pageflips (and
restored those), and we stalled for any pending modesets (grabbing
locks did that since all modesets where blocking), and that made sure
the hw was in a consistent state.

We always leaked the vblank state to userspace, but this approach here
also doesn't fix this. Plus broken rendering, but for these old
platforms I'm not too worried about displaying a few wrong frames
(with the new reset we will resubmit, so proper rendering should show
up soonish) - after all gpu reset nukes the entire display, there's no
way for the user to not notice that.

It would be neat to not have to do that, and Ville has a plan, but
meanwhile we still have this regression at hand that seems to be the
blocker for adding more machines to CI. I'd like to have the least
complex path to get that address (but maybe not long-term fixed, I'm
clear on that). If feasible.

If that's a unicorn, then let's go with Ville's approach, but then I
think we need the full thing with the races properly closed.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch