Re: [PATCH 4/7] drm/i915: Remove irq-related FIXME in reset code

Rodrigo Vivi <rodrigo.vivi@xxxxxxxxx> · Fri, 27 Feb 2015 13:54:02 -0800

cool, thanks for the detailed explanation.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@xxxxxxxxx>

On Fri, Feb 27, 2015 at 6:04 AM, Daniel Vetter <daniel@xxxxxxxx> wrote:
> On Thu, Feb 26, 2015 at 05:11:16PM -0800, Rodrigo Vivi wrote:
>> I believe this patch is on the wrong series, right?
>
> It's in here since I've spotted the FIXME while removing ums crap.
>
>> I'm afraid I don't know what was this race neither the two-step reset
>> to be able to review this comment remove.
>> Please give me some pointers to check that.
>
> Let me explain the history a bit. git blame on the various parts and this
> fixme should be able to dig out the details (it's a fun story):
>
> Originally we've had an unconditional drm_irq_install/unistall in the
> reset code. Which is not cool since it meant we'd kill all the interrutps
> that have been going on, so pageflips, vblank waits, crc checksums, gem
> waits all stopped working. This is the bug the FIXME is about.
>
> With fixed most of these issues by no longer disabling/enabling interrupts
> driver-wide, but only restoring the interrupt bits on the gt (they get
> lost in the reset). That takes care of all the modeset interrupts.
>
> The gem waits have been fixed differently and much earlier with the
> 2-stage reset code:
>
> - before reset we set a flag RESET_IN_PROGRESS and wake up all waiters.
>
> - after reset we clear that flag by incrementing the reset counter and
>   again wake all waiters
>
> Waiters always check this flag and the reset counter every time they are
> woken and bail out with -EINTR (to restart the entire ioctl) if that's the
> case. That means they'll never miss a reset and so won't be affected by
> interrupts suddenly being cleared.
>
> I've simply forgotten to remove the FIXME ;-)
>
> Cheers, Daniel
>>
>>
>> On Mon, Feb 23, 2015 at 3:03 AM, Daniel Vetter <daniel.vetter@xxxxxxxx> wrote:
>> > With the two-step reset counter increments which braket the actual
>> > reset code and the subsequent wake-up we're guaranteeing that all the
>> > lockless waiters _will_ be woken up. And since we unconditionally bail
>> > out of waits with -EAGAIN (or -EIO) in that case there is not risk of
>> > lost interrupt enabling bits when the lockless wait code races against
>> > a gpu reset.
>> >
>> > Let's remove this FIXME as resolved then.
>> >
>> > Signed-off-by: Daniel Vetter <daniel.vetter@xxxxxxxxx>
>> > ---
>> >  drivers/gpu/drm/i915/i915_drv.c | 6 ------
>> >  1 file changed, 6 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
>> > index cc6c51107047..89741e6e2d08 100644
>> > --- a/drivers/gpu/drm/i915/i915_drv.c
>> > +++ b/drivers/gpu/drm/i915/i915_drv.c
>> > @@ -878,12 +878,6 @@ int i915_reset(struct drm_device *dev)
>> >         }
>> >
>> >         /*
>> > -        * FIXME: This races pretty badly against concurrent holders of
>> > -        * ring interrupts. This is possible since we've started to drop
>> > -        * dev->struct_mutex in select places when waiting for the gpu.
>> > -        */
>> > -
>> > -       /*
>> >          * rps/rc6 re-init is necessary to restore state lost after the
>> >          * reset and the re-install of gt irqs. Skip for ironlake per
>> >          * previous concerns that it doesn't respond well to some forms
>> > --
>> > 2.1.4
>> >
>> > _______________________________________________
>> > Intel-gfx mailing list
>> > Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
>> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>>
>>
>>
>> --
>> Rodrigo Vivi
>> Blog: http://blog.vivi.eng.br
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

-- 
Rodrigo Vivi
Blog: http://blog.vivi.eng.br
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx