On Thu, 20 Sep 2012 09:49:19 +0200, Daniel Vetter <daniel at ffwll.ch> wrote: > On Wed, Sep 19, 2012 at 04:08:51PM +0100, Chris Wilson wrote: > > In commit 69c2fc891343cb5217c866d10709343cff190bdc > > Author: Chris Wilson <chris at chris-wilson.co.uk> > > Date: Fri Jul 20 12:41:03 2012 +0100 > > > > drm/i915: Remove the per-ring write list > > > > the explicit flush was removed from i915_ring_idle(). However, we > > continued to wait upon the next seqno which now did not correspond to > > any request (except for the unusual condition of a failure to queue a > > request after execbuffer) and so would wait indefinitely. > > > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk> > > Afaict gem_next_request_seqno sets ring->olr and i915_wait_seqno does > check whether olr is set and then adds the request - which for ring_idle is > pretty much guaranteed to be missing ;-) Yeah, ok, I'll accept that this is not the root cause of the issue. However, I'm convinced with the merits of the patch for not adding a request to the ring everytime we idle. > So tricky code, but I can't see the bug (and since both module unload and > suspend works, it'd be surprised if there is one). What am I missing here? Obviously you haven't encountered the impossible indefinite wait in __wait_seqno(). It's fairly sporadic, but is an eater of machines... Ah, what about a GPU reset+recovery leaving ring->outstanding_lazy_seqno set but reseting dev_priv->next_seqno to 1. Which is happily fixed up by this patch only looking at existing requests! diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 39de523..2286e42 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2189,6 +2189,7 @@ static void i915_gem_reset_ring_lists(struct drm_i915_priv i915_gem_request_remove_from_client(request); kfree(request); } + ring->outstanding_lazy_request = 0; while (!list_empty(&ring->active_list)) { struct drm_i915_gem_object *obj; -Chris -- Chris Wilson, Intel Open Source Technology Centre