Quoting Chris Wilson (2018-03-21 18:12:51) > Quoting Jeff McGee (2018-03-21 17:31:45) > > On Wed, Mar 21, 2018 at 10:26:24AM -0700, jeff.mcgee@xxxxxxxxx wrote: > > > From: Jeff McGee <jeff.mcgee@xxxxxxxxx> > > > > > > Engine reset is fast. A context switch interrupt may be generated just > > > prior to the reset such that the top half handler is racing with reset > > > post-processing. The handler may set the irq_posted bit again after > > > the reset code has cleared it to start fresh. Then the re-enabled > > > tasklet will read the CSB head and tail from MMIO, which will be at > > > the hardware reset values of 0 and 7 respectively, given that no > > > actual CSB event has occurred since the reset. Mayhem then ensues as > > > the tasklet starts processing invalid CSB entries. > > > > > > We can handle this corner case without adding any new synchronization > > > between the irq top half and the reset work item. The tasklet can > > > just skip CSB processing if the tail is not sane. > > > > > > Signed-off-by: Jeff McGee <jeff.mcgee@xxxxxxxxx> > > > --- > > If I drop this patch and substitute https://patchwork.freedesktop.org/patch/211831/ > > I will see irq_posted get set after reset which causes the first tasklet > > run to re-process a previous CSB event and hit GEM_BUG_ON that nothing > > was active. > > However, for reset+sync to be followed by an interrupt is surprising. > What more do we need to do after the reset to flush the last interrupt? Actually, it may not be a late interrupt, just a late cacheline flush from one processor to another. __set_bit bites. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx