Re: [RFC 7/8] drm/i915: Skip CSB processing on invalid CSB tail

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Wed, 21 Mar 2018 19:06:10 +0000



Quoting Chris Wilson (2018-03-21 18:12:51)
> Quoting Jeff McGee (2018-03-21 17:31:45)
> > On Wed, Mar 21, 2018 at 10:26:24AM -0700, jeff.mcgee@xxxxxxxxx wrote:
> > > From: Jeff McGee <jeff.mcgee@xxxxxxxxx>
> > > 
> > > Engine reset is fast. A context switch interrupt may be generated just
> > > prior to the reset such that the top half handler is racing with reset
> > > post-processing. The handler may set the irq_posted bit again after
> > > the reset code has cleared it to start fresh. Then the re-enabled
> > > tasklet will read the CSB head and tail from MMIO, which will be at
> > > the hardware reset values of 0 and 7 respectively, given that no
> > > actual CSB event has occurred since the reset. Mayhem then ensues as
> > > the tasklet starts processing invalid CSB entries.
> > > 
> > > We can handle this corner case without adding any new synchronization
> > > between the irq top half and the reset work item. The tasklet can
> > > just skip CSB processing if the tail is not sane.
> > > 
> > > Signed-off-by: Jeff McGee <jeff.mcgee@xxxxxxxxx>
> > > ---
> > If I drop this patch and substitute https://patchwork.freedesktop.org/patch/211831/
> > I will see irq_posted get set after reset which causes the first tasklet
> > run to re-process a previous CSB event and hit GEM_BUG_ON that nothing
> > was active.
> 
> However, for reset+sync to be followed by an interrupt is surprising.
> What more do we need to do after the reset to flush the last interrupt?

Actually, it may not be a late interrupt, just a late cacheline flush
from one processor to another. __set_bit bites.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx