[PATCH 1/4] drm/i915: don't return a spurious -EIO from intel_ring_begin

daniel at ffwll.ch (Daniel Vetter) · Sun, 1 Jul 2012 12:41:19 +0200

On Sun, Jul 1, 2012 at 5:09 AM, Ben Widawsky <ben at bwidawsk.net> wrote:
> On Tue, 26 Jun 2012 23:08:50 +0200
> Daniel Vetter <daniel.vetter at ffwll.ch> wrote:
>
>> Having this early check in intel_ring_begin doesn't buy us anything,
>> since we'll be calling into wait_request in the usual case already
>> anyway. In the corner case of not waiting for free space using the
>> last_retired_head we simply need to do the same check, too.
>>
>> With these changes we'll only ever get an -EIO from intel_ring_begin
>> if the gpu has truely been declared dead.
>>
>> v2: Don't conflate the change to prevent intel_ring_begin from returning
>> a spurious -EIO with the refactor to use i915_gem_check_wedge, which is
>> just prep work to avoid returning -EAGAIN in callsites that can't handle
>> syscall restarting.
>
> I'm really scared by this change. It's diffuclt to review because there
> are SO many callers of intel_ring_begin, and figuring out if they all
> work in the wedged case is simply too difficult. I've yet to review the
> rest of the series, but it may make more sense to put this change last
> perhaps?

Well, this patch doesn't really affect much what error codes the
callers get - we'll still throw both -EGAIN and -EIO at them (later on
patches will fix this).

What this patch does though is prevent us from returning too many
-EIO. Imagine the gpu died and we've noticed already (hence
dev_priv->mm.wedged is set), but some process is stuck churning
through the execbuf ioctl, holding dev->struct_mutex. While processing
the execbuf we call intel_ring_begin to emit a few commands. Now
usually, even when the gpu is dead, there is enough space in the ring
to do so, which allows us to complete the execbuf ioctl and then later
on we can block properly when trying to grab the mutex in the next
ioctl for the gpu reset work handler to complete.

But thanks to that wedged check in intel_ring_begin we'll instead
return -EIO, despite the fact that later on we could successfully
reset the gpu. Returning -EIO forces the X server to fall back to s/w
rendering and disabling dri2, and in the case of a 3d compositor
usually results in a abort. After this patch we can still return -EIO
if the gpu is dead but the reset work hasn't completed yet, but only
so if the ring is full (which in many cases is unlikely).

Cheers, Daniel

>
>>
>> Signed-Off-by: Daniel Vetter <daniel.vetter at ffwll.ch>
>> ---
>>  drivers/gpu/drm/i915/intel_ringbuffer.c |    4 ----
>>  1 file changed, 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> index f30a53a..501546e 100644
>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> @@ -1230,13 +1230,9 @@ int intel_wait_ring_buffer(struct intel_ring_buffer *ring, int n)
>>  int intel_ring_begin(struct intel_ring_buffer *ring,
>>                    int num_dwords)
>>  {
>> -     struct drm_i915_private *dev_priv = ring->dev->dev_private;
>>       int n = 4*num_dwords;
>>       int ret;
>>
>> -     if (unlikely(atomic_read(&dev_priv->mm.wedged)))
>> -             return -EIO;
>> -
>>       if (unlikely(ring->tail + n > ring->effective_size)) {
>>               ret = intel_wrap_ring_buffer(ring);
>>               if (unlikely(ret))
>
>
>
> --
> Ben Widawsky, Intel Open Source Technology Center
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
daniel.vetter at ffwll.ch - +41 (0) 79 364 57 48 - http://blog.ffwll.ch