Quoting Dong, Chuanxiao (2017-08-07 10:41:29) > Hello, > > Found there might be a corner case for intel_lrc_irq_handler() in a dead loop, want to understand if this can be real or not. > > The scenario is like: > 1. Write wedged to trigger a GPU reset; This is dangerous full stop, but even with a hangcheck the scenario is still plausible. > 2. meanwhile, there is one ongoing request in port[0], and its context switch interrupt is generated from HW; > 3. as interrupt line is disabled during GPU reset, it is possible that this interrupt is not handled by intel_lrc_irq_handler(); > 4. during GPU reset, the CSB tail is reset to 0x7 which is a default value; In theory, yes. This prevents the delayed context switch interrupt from having any meaning. > 5. i915 try to replay this request during GPU reset; If the context-switch occurred (but still pending in IIR), the request is complete, it will not be replayed. > 6. GPU reset completed; > 7. handling the pending interrupt of the step#2. > > Normally as in step#5 driver wrote the ELSP and replayed a request so the CSB tail should be updated to 0 in step#7. But if the CSB tail updating is not that quick, in step#7 when handling the last pending interrupt the CSB tail is still 0x7, the intel_lrc_irq_handler() will be in a dead loop then. > > If the CSB tail updating is not synchronized with the ELSP writing then my understanding is that it is possible to encounter this corner case. If so, shall we clear the pending interrupts in IIR during i915_reset? Please correct me if anything wrong. The CSB buf+tail is synchronized to the interrupt. Our goal is to make sure that the GPU is truly reset before we reset our state tracking so that we don't have pending events on replay. However, the CSB itself is a little bit of a black box as it is squirreled away in a power context on reset, and it is only with a bit of handwaving that it is reset to a default empty value on reset. CSB interrupt -> pending GPU reset -> clears CSB head/tail post-reset, re-enable interrupts, raise CSB interrupt -> intel_lrc_irq_handler() if (CSB_head == CSB_tail) break; Should be no problem. Similarly for a delayed tasklet, we haven't posted the CSB interrupt and so we don't even read the CSB_head/tail as they as still undefined (prior to the first CSB interrupt). -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx