Re: [PATCH v4 5/5] drm/i915: Solve the GPU reset vs. modeset deadlocks with an rw_semaphore

Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> · Thu, 29 Jun 2017 22:26:08 +0300

On Thu, Jun 29, 2017 at 06:57:30PM +0100, Chris Wilson wrote:
> Quoting ville.syrjala@xxxxxxxxxxxxxxx (2017-06-29 15:36:42)
> > From: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx>
> > 
> > Introduce an rw_semaphore to protect the display commits. All normal
> > commits use down_read() and hence can proceed in parallel, but GPU reset
> > will use down_write() making sure no other commits are in progress when
> > we have to pull the plug on the display engine on pre-g4x platforms.
> > There are no modeset/gem locks taken inside __intel_atomic_commit_tail()
> > itself, and we wait for all dependencies before the down_read(), and
> > thus there is no chance of deadlocks with this scheme.
> 
> This matches what I thought should be done (I didn't think of using
> rwsem just a mutex, nice touch). The point I got stuck on was what
> should be done after the reset? I expected another modeset to return the
> state back or otherwise the inflight would get confused?

I guess that can happen. For instance, if we have a crtc_enable() in flight,
and then we do a reset before it gets committed we would end up doing
crtc_enable() twice in a row without a crtc_disable in between. For page
flips and such this shouldn't be a big deal in general.

>  
> > During reset we should be recommiting the state that was committed last.
> > But for now we'll settle for recommiting the last state for each object.
> 
> Ah, I guess that explains the above. What is the complication with
> restoring the current state as opposed to the next state?

Well the main thing is just tracking which is the current state. That
just needs refactoring the .atomic_duplicate_state() calling convention
across the whole tree so that we can then duplicate the committed state
rather than the latest state.

Also due to the commit_hw_done() being potentially done after the
modeset locks have been dropped, I don't think we can be certain
of it getting called in the same order as swap_state(), hence
when we track the committed state in commit_hw_done() we'll have
to have some way to figure out if our new state is in fact the
latest committed state for each object or if the calls got
reordered. We don't insert any dependencies between two commits
unless they touch the same active crtc, thus this reordering
seems very much possible. Dunno if we should add some way to add
such dependeencies whenever the same object is part of two otherwise
independent commits, or if we just want to try and work with the
reordered calls. My idea for the latter was some kind of seqno/age
counter on the object states that allows me to recongnize which
state is more recent. The object states aren't refcounted so hanging
on to the wrong pointer could cause an oops the next time we have to
perform a GPU reset.

-- 
Ville Syrjälä
Intel OTC