On Thu, Oct 31, 2024 at 2:31 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > On Wed, Oct 23, 2024, Yong He wrote: > > From: Yong He <alexyonghe@xxxxxxxxxxx> > > > > Try to enable irr_pending when set APIC state, if there is > > pending interrupt in IRR with disabled APICv. > > > > In save/restore VM scenery with disabled APICv. Qemu/CloudHypervisor > > always send signals to stop running vcpu threads, then save > > entire VM state, including APIC state. There may be a pending > > timer interrupt in the saved APIC IRR that is injected before > > vcpu_run return. But when restoring the VM, since APICv is > > disabled, irr_pending is disabled by default, so this may cause > > the timer interrupt in the IRR to be suspended for a long time, > > until the next interrupt comes. > > > > Signed-off-by: Yong He <alexyonghe@xxxxxxxxxxx> > > --- > > arch/x86/kvm/lapic.c | 4 ++++ > > 1 file changed, 4 insertions(+) > > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > > index 2098dc689088..7373f649958b 100644 > > --- a/arch/x86/kvm/lapic.c > > +++ b/arch/x86/kvm/lapic.c > > @@ -3099,6 +3099,10 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s) > > apic_find_highest_irr(apic)); > > kvm_x86_call(hwapic_isr_update)(apic_find_highest_isr(apic)); > > } > > + > > + /* Search the IRR and enable irr_pending state with disabled APICv*/ > > + if (!enable_apicv && apic_search_irr(apic) != -1) > > This can/should be an "else" from the above "if (apic->apicv_active)". I also > think KVM can safely clear irr_pending in this case, which is also why irr_pending > isn't handling in kvm_apic_update_apicv(). When APICv is disabled (inhibited) at > runtime, an IRQ may be in-flight, i.e. apic_search_irr() can get a false negative. Thank you for your review and suggestions. > > But when stuffing APIC state, I don't see how that can happen. So this? Here is our case. APICv is disabled by set enable_apicv to 0. Create VM snapshot, then start/restore new VM base the snapshot. We occasionally encountered issues with VMs hanging for long periods of time after restore. Investigation show that there is a timer IRQ pending in IRR, but the newly restored VM could not detect it, because irr_pending is not set when restoring the APIC state by kvm_apic_set_state(). Further investigation show when creating VM snapshot, VMM pause VCPUs by signal, an in-flight timer pending in IRR, and the tscdeadline is 0 in saved APIC state. All these contexts in saved APIC state prove that kvm_inject_pending_timer_irqs had just injected a timer (will also set the tscdeadline to 0) before the VCPU handle the signal. Maybe this patch is a fix for 755c2bf87860 (KVM: x86: lapic: don't touch irr_pending in kvm_apic_update_apicv when inhibiting it), the irr_pending enable check is missed in kvm_apic_set_state() after that. > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > index 65412640cfc7..deb73aea2c06 100644 > --- a/arch/x86/kvm/lapic.c > +++ b/arch/x86/kvm/lapic.c > @@ -3086,6 +3086,15 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s) > kvm_x86_call(hwapic_irr_update)(vcpu, > apic_find_highest_irr(apic)); > kvm_x86_call(hwapic_isr_update)(apic_find_highest_isr(apic)); > + } else { > + /* > + * Note, kvm_apic_update_apicv() is responsible for updating > + * isr_count and highest_isr_cache. irr_pending is somewhat > + * special because it mustn't be cleared when APICv is disabled > + * at runtime, and only state restore can cause an IRR bit to > + * be set without also refreshing irr_pending. > + */ > + apic->irr_pending = apic_search_irr(apic) != -1; > } > kvm_make_request(KVM_REQ_EVENT, vcpu); > if (ioapic_in_kernel(vcpu->kvm)) > > > + apic->irr_pending = true; > > kvm_make_request(KVM_REQ_EVENT, vcpu); > > if (ioapic_in_kernel(vcpu->kvm)) > > kvm_rtc_eoi_tracking_restore_one(vcpu); > > -- > > 2.43.5 > >