Ashok, "Raj, Ashok" <ashok.raj@xxxxxxxxx> writes: > On Fri, May 08, 2020 at 06:49:15PM +0200, Thomas Gleixner wrote: >> "Raj, Ashok" <ashok.raj@xxxxxxxxx> writes: >> > With legacy MSI we can have these races and kernel is trying to do the >> > song and dance, but we see this happening even when IR is turned on. >> > Which is perplexing. I think when we have IR, once we do the change vector >> > and flush the interrupt entry cache, if there was an outstandng one in >> > flight it should be in IRR. Possibly should be clearned up by the >> > send_cleanup_vector() i suppose. >> >> Ouch. With IR this really should never happen and yes the old vector >> will catch one which was raised just before the migration disabled the >> IR entry. During the change nothing can go wrong because the entry is >> disabled and only reenabled after it's flushed which will send a pending >> one to the new vector. > > with IR, I'm not sure if we actually mask the interrupt except when > its a Posted Interrupt. Indeed. Memory tricked me. > We do an atomic update to IRTE, with cmpxchg_double > > ret = cmpxchg_double(&irte->low, &irte->high, > irte->low, irte->high, > irte_modified->low, irte_modified->high); We only do this if: if ((irte->pst == 1) || (irte_modified->pst == 1)) i.e. the irte entry was or is going to be used for posted interrupts. In the non-posted remapping case we do: set_64bit(&irte->low, irte_modified->low); set_64bit(&irte->high, irte_modified->high); > followed by flushing the interrupt entry cache. After which any > old ones in flight before the flush should be sittig in IRR > on the outgoing cpu. Correct. > The send_cleanup_vector() sends IPI to the apic_id->old_cpu which > would be the cpu we are running on correct? and this is a self_ipi > to IRQ_MOVE_CLEANUP_VECTOR. Yes, for the IR case the cleanup vector IPI is sent right away because the IR logic allegedly guarantees that after flushing the IRTE cache no interrupt can be sent to the old vector. IOW, after the flush a vector sent to the previous CPU must be already in the IRR of that CPU. > smp_irq_move_cleanup_interrupt() seems to check IRR with > apicid_prev_vector() > > irr = apic_read(APIC_IRR + (vector / 32 * 0x10)); > if (irr & (1U << (vector % 32))) { > apic->send_IPI_self(IRQ_MOVE_CLEANUP_VECTOR); > continue; > } > > And this would allow any pending IRR bits in the outgoing CPU to > call the relevant ISR's before draining all vectors on the outgoing > CPU. No. When the CPU goes offline it does not handle anything after moving the interrupts away. The pending bits are handled in fixup_irq() after the forced migration. If the vector is set in the IRR it sends an IPI to the new target CPU. That IRR check is paranoia for the following case where interrupt migration happens between live CPUs: Old CPU set affinity <- device sends to old vector/cpu ... irte_...() send_cleanup_vector(); handle_cleanup_vector() <- Would remove the vector without that IRR check and then result in a spurious interrupt This should actually never happen because the cleanup vector is the lowest priority vector. > I couldn't quite pin down how the device ISR's are hooked up through > this send_cleanup_vector() and what follows. Device ISRs don't know anything about the underlying irq machinery. The non IR migration does: Old CPU New CPU set affinity ... <- device sends to new vector/cpu handle_irq ack() apic_ack_irq() irq_complete_move() send_cleanup_vector() handle_cleanup_vector() In case that an interrupt still hits the old CPU: set affinity <- device sends to old vector/cpu ... handle_irq ack() apic_ack_irq() irq_complete_move() (Does not send cleanup vector because wrong CPU) <- device sends to new vector/cpu handle_irq ack() apic_ack_irq() irq_complete_move() send_cleanup_vector() handle_cleanup_vector() Thanks, tglx