Re: MSI interrupt for xhci still lost on 5.6-rc6 after cpu hotplug

Thomas Gleixner <tglx@xxxxxxxxxxxxx> · Tue, 05 May 2020 23:47:26 +0200

Ashok,

"Raj, Ashok" <ashok.raj@xxxxxxxxx> writes:
> On Tue, May 05, 2020 at 09:36:04PM +0200, Thomas Gleixner wrote:
>> The way it works is:
>> 
>>     1) New vector on different CPU is allocated
>> 
>>     2) New vector is installed
>> 
>>     3) When the first interrupt on the new CPU arrives then the cleanup
>>        IPI is sent to the previous CPU which tears down the old vector
>> 
>> So if the interrupt is sent to the original CPU just before #2 then this
>> will be correctly handled on that CPU after the set affinity code
>> reenables interrupts.
>
> I'll have this test tried out.. but in msi_set_affinity() the check below
> for lapic_vector_set_in_irr(cfg->vector), this is the new vector correct? 
> don't we want to check for old_cfg.vector instead?
>
> msi_set_affinit ()
> {
> ....
>         unlock_vector_lock();
>
>         /*
>          * Check whether the transition raced with a device interrupt and
>          * is pending in the local APICs IRR. It is safe to do this outside
>          * of vector lock as the irq_desc::lock of this interrupt is still
>          * held and interrupts are disabled: The check is not accessing the
>          * underlying vector store. It's just checking the local APIC's
>          * IRR.
>          */
>         if (lapic_vector_set_in_irr(cfg->vector))
>                 irq_data_get_irq_chip(irqd)->irq_retrigger(irqd);

No. This catches the transitional interrupt to the new vector on the
original CPU, i.e. the one which is running that code.

Again the steps are:

 1) Allocate new vector on new CPU

 2) Set new vector on original CPU

 3) Set new vector on new CPU

So we have 3 points where an interrupt can fire:

 A) Before #2

 B) After #2 and before #3

 C) After #3

#A is hitting the old vector which is still valid on the old CPU and
   will be handled once interrupts are enabled with the correct irq
   descriptor - Normal operation (same as with maskable MSI)

#B This must be checked in the IRR because the there is no valid vector
   on the old CPU.

#C is handled on the new vector on the new CPU

Thanks,

        tglx