Hi Daniel, could you please be clearer on the change you mean. I think you mean something functionally equivalent to the code below, but done in a less hacky way. (This slight change has made no change to test results) Or is the idea to return at a different point to this? I couldn't find " dev_priv->mm.reload_in_reset or similar" in the code. The only thing I can find is error->reset_counter, which is used in check_wedge(). Bottom bit set means RESET_IN_PROGRESS, top bit means WEDGED --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -1832,7 +1832,9 @@ int intel_ring_begin(struct intel_engine_cs *ring, ret = i915_gem_check_wedge(&dev_priv->gpu_error, dev_priv->mm.interruptible); - if (ret) + + /* -EAGAIN means a reset is in progress, it is Ok to return */ + if (ret == -EAGAIN) + return 0; + if (ret) + return ret; ret = __intel_ring_prepare(ring, num_dwords * sizeof(uint32_t)); Alistair. -----Original Message----- From: Intel-gfx [mailto:intel-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx] On Behalf Of Daniel Vetter Sent: Tuesday, July 29, 2014 11:33 AM To: Chris Wilson; Daniel Vetter; Ben Widawsky; intel-gfx@xxxxxxxxxxxxxxxxxxxxx Subject: Re: [PATCH] drm/i915: Rework GPU reset sequence to match driver load & thaw On Tue, Jul 29, 2014 at 08:36:33AM +0100, Chris Wilson wrote: > On Mon, Jul 28, 2014 at 11:26:38AM +0200, Daniel Vetter wrote: > > Oh, I guess that's the tricky bit why the old approach never worked > > - because reset_in_progress is set we failed the context/ppgtt > > loading through the rings and screwed up. > > > > Problem with your approach is that we want to bail out here if a > > reset is in progress, so we can't just eat the EAGAIN. If we do that > > we potentially deadlock or overflow the ring. > > > > I think we need a different hack here, and a few layers down (i.e. > > at the place where we actually generate that offending -EAGAIN). > > > > - Around the re-init sequence in the reset function we set > > dev_priv->mm.reload_in_reset or similar . Since we hold dev->struct_mutex > > no one will see that, as long as we never leak it out of the critical > > section. > > > > - In the ring_begin code that checks for gpu hangs we ignore > > reset_in_progress if this bit is set. > > > > - Both places need fairly big comments to explain what exactly is going > > on. > > This is going from bad to worse. I think you can do better if you > looked at the problem afresh. Well we can't really reset reset_in_progress at that point, since not all reset is done yet. Especially the modeset stuff. So I don't think that reordering the reset sequence would get us out of this ugly spot. And I don't see any other solution really. Do you? -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx