Re: a potential dead loop in intel_lrc_irq_handler

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: Chris Wilson [mailto:chris@xxxxxxxxxxxxxxxxxx]
> Sent: Monday, August 7, 2017 5:56 PM
> To: Dong, Chuanxiao <chuanxiao.dong@xxxxxxxxx>; intel-
> gfx@xxxxxxxxxxxxxxxxxxxxx; Joonas Lahtinen
> <joonas.lahtinen@xxxxxxxxxxxxxxx>
> Subject: Re: a potential dead loop in intel_lrc_irq_handler
> 
> Quoting Dong, Chuanxiao (2017-08-07 10:41:29)
> > Hello,
> >
> > Found there might be a corner case for intel_lrc_irq_handler() in a dead
> loop, want to understand if this can be real or not.
> >
> > The scenario is like:
> 
> > 1. Write wedged to trigger a GPU reset;
> 
> This is dangerous full stop, but even with a hangcheck the scenario is still
> plausible.
> 
> > 2. meanwhile, there is one ongoing request in port[0], and its context
> > switch interrupt is generated from HW; 3. as interrupt line is
> > disabled during GPU reset, it is possible that this interrupt is not
> > handled by intel_lrc_irq_handler(); 4. during GPU reset, the CSB tail
> > is reset to 0x7 which is a default value;
> 
> In theory, yes. This prevents the delayed context switch interrupt from
> having any meaning.
> 
> > 5. i915 try to replay this request during GPU reset;
> 
> If the context-switch occurred (but still pending in IIR), the request is
> complete, it will not be replayed.
> 
> > 6. GPU reset completed;
> > 7. handling the pending interrupt of the step#2.
> >
> > Normally as in step#5 driver wrote the ELSP and replayed a request so the
> CSB tail should be updated to 0 in step#7. But if the CSB tail updating is not
> that quick, in step#7 when handling the last pending interrupt the CSB tail is
> still 0x7, the intel_lrc_irq_handler() will be in a dead loop then.
> >
> > If the CSB tail updating is not synchronized with the ELSP writing then my
> understanding is that it is possible to encounter this corner case. If so, shall
> we clear the pending interrupts in IIR during i915_reset? Please correct me if
> anything wrong.
> 
> The CSB buf+tail is synchronized to the interrupt. Our goal is to make sure
> that the GPU is truly reset before we reset our state tracking so that we don't
> have pending events on replay.
> 
> However, the CSB itself is a little bit of a black box as it is squirreled away in a
> power context on reset, and it is only with a bit of handwaving that it is reset
> to a default empty value on reset.
> 
> CSB interrupt -> pending
> GPU reset -> clears CSB head/tail
But the GPU reset will make CSB_head = 0 and CSB_tail = 7.

> post-reset, re-enable interrupts, raise CSB interrupt
> -> intel_lrc_irq_handler()
> 	if (CSB_head == CSB_tail)
> 		break;

So here intel_lrc_irq_handler() cannot break out. Looks like we are still stuck in intel_lrc_irq_handler(), right?

Thanks
Chuanxiao
> 
> Should be no problem. Similarly for a delayed tasklet, we haven't posted the
> CSB interrupt and so we don't even read the CSB_head/tail as they as still
> undefined (prior to the first CSB interrupt).
> -Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux