Re: MSI interrupt for xhci still lost on 5.6-rc6 after cpu hotplug

Thomas Gleixner <tglx@xxxxxxxxxxxxx> · Tue, 24 Mar 2020 20:03:44 +0100

Evan Green <evgreen@xxxxxxxxxxxx> writes:
> On Mon, Mar 23, 2020 at 5:24 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>> And of course all of this is so well documented that all of us can
>> clearly figure out what's going on...
>
> I won't pretend to know what's going on, so I'll preface this by
> labeling it all as "flailing", but:
>
> I wonder if there's some way the interrupt can get delayed between
> XHCI snapping the torn value and it finding its way into the IRR. For
> instance, if xhci read this value at the start of their interrupt
> moderation timer period, that would be awful (I hope they don't do
> this). One test patch would be to carve out 8 vectors reserved for
> xhci on all cpus. Whenever you change the affinity, the assigned
> vector is always reserved_base + cpu_number. That lets you exercise
> the affinity switching code, but in a controlled manner where torn
> interrupts could be easily seen (ie hey I got an interrupt on cpu 4's
> vector but I'm cpu 2). I might struggle to write such a change, but in
> theory it's doable.

Well, the point is that we don't see a spurious interrupt on any
CPU. We added a traceprintk into do_IRQ() and that would immediately
tell us where the thing goes off into lala land. Which it didn't.

> I was alternately trying to build a theory in my head about the write
> somehow being posted and getting out of order, but I don't think that
> can happen.

If that happens then the lost XHCI interrupt is the least of your
worries.

Thanks,

        tglx