On Sun, Jul 1, 2012 at 5:09 AM, Ben Widawsky <ben at bwidawsk.net> wrote: > On Tue, 26 Jun 2012 23:08:50 +0200 > Daniel Vetter <daniel.vetter at ffwll.ch> wrote: > >> Having this early check in intel_ring_begin doesn't buy us anything, >> since we'll be calling into wait_request in the usual case already >> anyway. In the corner case of not waiting for free space using the >> last_retired_head we simply need to do the same check, too. >> >> With these changes we'll only ever get an -EIO from intel_ring_begin >> if the gpu has truely been declared dead. >> >> v2: Don't conflate the change to prevent intel_ring_begin from returning >> a spurious -EIO with the refactor to use i915_gem_check_wedge, which is >> just prep work to avoid returning -EAGAIN in callsites that can't handle >> syscall restarting. > > I'm really scared by this change. It's diffuclt to review because there > are SO many callers of intel_ring_begin, and figuring out if they all > work in the wedged case is simply too difficult. I've yet to review the > rest of the series, but it may make more sense to put this change last > perhaps? Well, this patch doesn't really affect much what error codes the callers get - we'll still throw both -EGAIN and -EIO at them (later on patches will fix this). What this patch does though is prevent us from returning too many -EIO. Imagine the gpu died and we've noticed already (hence dev_priv->mm.wedged is set), but some process is stuck churning through the execbuf ioctl, holding dev->struct_mutex. While processing the execbuf we call intel_ring_begin to emit a few commands. Now usually, even when the gpu is dead, there is enough space in the ring to do so, which allows us to complete the execbuf ioctl and then later on we can block properly when trying to grab the mutex in the next ioctl for the gpu reset work handler to complete. But thanks to that wedged check in intel_ring_begin we'll instead return -EIO, despite the fact that later on we could successfully reset the gpu. Returning -EIO forces the X server to fall back to s/w rendering and disabling dri2, and in the case of a 3d compositor usually results in a abort. After this patch we can still return -EIO if the gpu is dead but the reset work hasn't completed yet, but only so if the ring is full (which in many cases is unlikely). Cheers, Daniel > >> >> Signed-Off-by: Daniel Vetter <daniel.vetter at ffwll.ch> >> --- >> drivers/gpu/drm/i915/intel_ringbuffer.c | 4 ---- >> 1 file changed, 4 deletions(-) >> >> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c >> index f30a53a..501546e 100644 >> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c >> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c >> @@ -1230,13 +1230,9 @@ int intel_wait_ring_buffer(struct intel_ring_buffer *ring, int n) >> int intel_ring_begin(struct intel_ring_buffer *ring, >> int num_dwords) >> { >> - struct drm_i915_private *dev_priv = ring->dev->dev_private; >> int n = 4*num_dwords; >> int ret; >> >> - if (unlikely(atomic_read(&dev_priv->mm.wedged))) >> - return -EIO; >> - >> if (unlikely(ring->tail + n > ring->effective_size)) { >> ret = intel_wrap_ring_buffer(ring); >> if (unlikely(ret)) > > > > -- > Ben Widawsky, Intel Open Source Technology Center > _______________________________________________ > Intel-gfx mailing list > Intel-gfx at lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Daniel Vetter daniel.vetter at ffwll.ch - +41 (0) 79 364 57 48 - http://blog.ffwll.ch