On Wed, May 08, 2013 at 04:02:00PM +0200, Daniel Vetter wrote: > On Wed, May 08, 2013 at 02:29:30PM +0100, Chris Wilson wrote: > > There is an unlikely corner case whereby a lockless wait may not notice > > a GPU hang and reset, and so continue to wait for the device to advance > > beyond the chosen seqno. This of course may never happen as the waiter > > may be the only user. Instead, we can explicitly advance the device > > seqno to match the requests that are forcibly retired following the > > hang. > > > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk> > > This race is why the reset counter must always increase and can't just > flip-flop between the reset-in-progress and everything-works states. > > Now if we want to unwedge on resume we need to reconsider this, but imo it > would be easier to simply remember the reset counter before we wedge the > gpu and restore that one (incremented as if the gpu reset worked). We > already assume that wedged will never collide with a real reset counter, > so this should work. Agree that this a unwedge-upon-resume issue, but my argument here is that this leaves the hardware state consistent with what we forcibly reset it to. From that perspective your suggestion is papering over this here bug and this is the neat solution. -Chris -- Chris Wilson, Intel Open Source Technology Centre