On Wed, Sep 28, 2022, Maxim Levitsky wrote: > On Wed, 2022-09-28 at 16:51 +0000, Sean Christopherson wrote: > > > > It happens regardless of vCPU count (tested with 2, 32, 255, 380, and > > > > 512 vCPUs). This state persists for all subsequent reboots, until the VM > > > > is terminated. For vCPU counts < 256, when x2apic is disabled the > > > > problem does not occur, and AVIC continues to work properly after reboots. > > > > Bit of a shot in the dark, but does the below fix the issue? There are two > > issues with calling kvm_lapic_xapic_id_updated() from kvm_apic_state_fixup(): > > > > 1. The xAPIC ID should only be refreshed on "set". > True - I didn't bother to fix it yet because it shouldn't cause harm, but > sure this needs to be fixed. It's probably benign on its own, but with the missing "hardware enabled" check, it could be problematic if userspace does KVM_GET_LAPIC while the APIC is hardware disabled, after the APIC was previously in x2APIC mode. I'm guessing QEMU does KVM_GET_LAPIC state when emulating reboot, hence the potential for being involved in the bug Alejandro is seeing. > > 2. The refresh needs to be noted after memcpy(vcpu->arch.apic->regs, s->regs, sizeof(*s)); > Are you sure? The check is first because if it fails, then error is returned to userspace > and the KVM's state left unchanged. > > I assume you are talking about > > .... > r = kvm_apic_state_fixup(vcpu, s, true); > if (r) { > kvm_recalculate_apic_map(vcpu->kvm); > return r; > } > memcpy(vcpu->arch.apic->regs, s->regs, sizeof(*s)); This isn't a failure path though, it's purely a "take note of the update", and KVM needs to do that processing _after_ the actual update. Specifically, kvm_lapic_xapic_id_updated() consumes the internal APIC state: if (kvm_xapic_id(apic) == apic->vcpu->vcpu_id) return; Calling that before the internal state has been set with the incoming state from userspace is simply wrong. The check that the x2APIC ID is "correct" stays where it is, this is purely the "is the xAPIC ID different" path.