Evan Green <evgreen@xxxxxxxxxxxx> writes: > On Mon, Mar 23, 2020 at 5:24 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: >> And of course all of this is so well documented that all of us can >> clearly figure out what's going on... > > I won't pretend to know what's going on, so I'll preface this by > labeling it all as "flailing", but: > > I wonder if there's some way the interrupt can get delayed between > XHCI snapping the torn value and it finding its way into the IRR. For > instance, if xhci read this value at the start of their interrupt > moderation timer period, that would be awful (I hope they don't do > this). One test patch would be to carve out 8 vectors reserved for > xhci on all cpus. Whenever you change the affinity, the assigned > vector is always reserved_base + cpu_number. That lets you exercise > the affinity switching code, but in a controlled manner where torn > interrupts could be easily seen (ie hey I got an interrupt on cpu 4's > vector but I'm cpu 2). I might struggle to write such a change, but in > theory it's doable. Well, the point is that we don't see a spurious interrupt on any CPU. We added a traceprintk into do_IRQ() and that would immediately tell us where the thing goes off into lala land. Which it didn't. > I was alternately trying to build a theory in my head about the write > somehow being posted and getting out of order, but I don't think that > can happen. If that happens then the lost XHCI interrupt is the least of your worries. Thanks, tglx