On 02/07/19 11:25, Wanpeng Li wrote: > From: Wanpeng Li <wanpengli@xxxxxxxxxxx> > > Thomas reported that: > > | Background: > | > | In preparation of supporting IPI shorthands I changed the CPU offline > | code to software disable the local APIC instead of just masking it. > | That's done by clearing the APIC_SPIV_APIC_ENABLED bit in the APIC_SPIV > | register. > | > | Failure: > | > | When the CPU comes back online the startup code triggers occasionally > | the warning in apic_pending_intr_clear(). That complains that the IRRs > | are not empty. > | > | The offending vector is the local APIC timer vector who's IRR bit is set > | and stays set. > | > | It took me quite some time to reproduce the issue locally, but now I can > | see what happens. > | > | It requires apicv_enabled=0, i.e. full apic emulation. With apicv_enabled=1 > | (and hardware support) it behaves correctly. > | > | Here is the series of events: > | > | Guest CPU > | > | goes down > | > | native_cpu_disable() > | > | apic_soft_disable(); > | > | play_dead() > | > | .... > | > | startup() > | > | if (apic_enabled()) > | apic_pending_intr_clear() <- Not taken > | > | enable APIC > | > | apic_pending_intr_clear() <- Triggers warning because IRR is stale > | > | When this happens then the deadline timer or the regular APIC timer - > | happens with both, has fired shortly before the APIC is disabled, but the > | interrupt was not serviced because the guest CPU was in an interrupt > | disabled region at that point. > | > | The state of the timer vector ISR/IRR bits: > | > | ISR IRR > | before apic_soft_disable() 0 1 > | after apic_soft_disable() 0 1 > | > | On startup 0 1 > | > | Now one would assume that the IRR is cleared after the INIT reset, but this > | happens only on CPU0. > | > | Why? > | > | Because our CPU0 hotplug is just for testing to make sure nothing breaks > | and goes through an NMI wakeup vehicle because INIT would send it through > | the boots-trap code which is not really working if that CPU was not > | physically unplugged. > | > | Now looking at a real world APIC the situation in that case is: > | > | ISR IRR > | before apic_soft_disable() 0 1 > | after apic_soft_disable() 0 1 > | > | On startup 0 0 > | > | Why? > | > | Once the dying CPU reenables interrupts the pending interrupt gets > | delivered as a spurious interupt and then the state is clear. > | > | While that CPU0 hotplug test case is surely an esoteric issue, the APIC > | emulation is still wrong, Even if the play_dead() code would not enable > | interrupts then the pending IRR bit would turn into an ISR .. interrupt > | when the APIC is reenabled on startup. > > > From SDM 10.4.7.2 Local APIC State After It Has Been Software Disabled > * Pending interrupts in the IRR and ISR registers are held and require > masking or handling by the CPU. > > In Thomas's testing, hardware cpu will not respect soft disable LAPIC > when IRR has already been set or APICv posted-interrupt is in flight, > so we can skip soft disable APIC checking when clearing IRR and set ISR, > continue to respect soft disable APIC when attempting to set IRR. > > Reported-by: Rong Chen <rong.a.chen@xxxxxxxxx> > Reported-by: Feng Tang <feng.tang@xxxxxxxxx> > Reported-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > Tested-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx> > Cc: Radim Krčmář <rkrcmar@xxxxxxxxxx> > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > Cc: Rong Chen <rong.a.chen@xxxxxxxxx> > Cc: Feng Tang <feng.tang@xxxxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx > Signed-off-by: Wanpeng Li <wanpengli@xxxxxxxxxxx> > --- > arch/x86/kvm/lapic.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > index 05d8934..f857a12 100644 > --- a/arch/x86/kvm/lapic.c > +++ b/arch/x86/kvm/lapic.c > @@ -2376,7 +2376,7 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu) > struct kvm_lapic *apic = vcpu->arch.apic; > u32 ppr; > > - if (!apic_enabled(apic)) > + if (!kvm_apic_hw_enabled(apic)) > return -1; > > __apic_update_ppr(apic, &ppr); > Queued, thanks. Paolo