> -----Original Message----- > From: Chris Wilson [mailto:chris@xxxxxxxxxxxxxxxxxx] > Sent: Monday, August 7, 2017 5:56 PM > To: Dong, Chuanxiao <chuanxiao.dong@xxxxxxxxx>; intel- > gfx@xxxxxxxxxxxxxxxxxxxxx; Joonas Lahtinen > <joonas.lahtinen@xxxxxxxxxxxxxxx> > Subject: Re: a potential dead loop in intel_lrc_irq_handler > > Quoting Dong, Chuanxiao (2017-08-07 10:41:29) > > Hello, > > > > Found there might be a corner case for intel_lrc_irq_handler() in a dead > loop, want to understand if this can be real or not. > > > > The scenario is like: > > > 1. Write wedged to trigger a GPU reset; > > This is dangerous full stop, but even with a hangcheck the scenario is still > plausible. > > > 2. meanwhile, there is one ongoing request in port[0], and its context > > switch interrupt is generated from HW; 3. as interrupt line is > > disabled during GPU reset, it is possible that this interrupt is not > > handled by intel_lrc_irq_handler(); 4. during GPU reset, the CSB tail > > is reset to 0x7 which is a default value; > > In theory, yes. This prevents the delayed context switch interrupt from > having any meaning. > > > 5. i915 try to replay this request during GPU reset; > > If the context-switch occurred (but still pending in IIR), the request is > complete, it will not be replayed. > > > 6. GPU reset completed; > > 7. handling the pending interrupt of the step#2. > > > > Normally as in step#5 driver wrote the ELSP and replayed a request so the > CSB tail should be updated to 0 in step#7. But if the CSB tail updating is not > that quick, in step#7 when handling the last pending interrupt the CSB tail is > still 0x7, the intel_lrc_irq_handler() will be in a dead loop then. > > > > If the CSB tail updating is not synchronized with the ELSP writing then my > understanding is that it is possible to encounter this corner case. If so, shall > we clear the pending interrupts in IIR during i915_reset? Please correct me if > anything wrong. > > The CSB buf+tail is synchronized to the interrupt. Our goal is to make sure > that the GPU is truly reset before we reset our state tracking so that we don't > have pending events on replay. > > However, the CSB itself is a little bit of a black box as it is squirreled away in a > power context on reset, and it is only with a bit of handwaving that it is reset > to a default empty value on reset. > > CSB interrupt -> pending > GPU reset -> clears CSB head/tail But the GPU reset will make CSB_head = 0 and CSB_tail = 7. > post-reset, re-enable interrupts, raise CSB interrupt > -> intel_lrc_irq_handler() > if (CSB_head == CSB_tail) > break; So here intel_lrc_irq_handler() cannot break out. Looks like we are still stuck in intel_lrc_irq_handler(), right? Thanks Chuanxiao > > Should be no problem. Similarly for a delayed tasklet, we haven't posted the > CSB interrupt and so we don't even read the CSB_head/tail as they as still > undefined (prior to the first CSB interrupt). > -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx