Introduction ------------ Secure AVIC is a new hardware feature in the AMD64 architecture to allow SEV-SNP guests to prevent hypervisor from generating unexpected interrupts to a vCPU or otherwise violate architectural assumptions around APIC behavior. One of the significant differences from AVIC or emulated x2APIC is that Secure AVIC uses a guest-owned and managed APIC backing page. It also introduces additional fields in both the VMCB and the Secure AVIC backing page to aid the guest in limiting which interrupt vectors can be injected into the guest. Guest APIC Backing Page ----------------------- Each vCPU has a guest-allocated APIC backing page of size 4K, which maintains APIC state for that vCPU. The x2APIC MSRs are mapped at their corresposing x2APIC MMIO offset within the guest APIC backing page. All x2APIC accesses by guest or Secure AVIC hardware operate on this backing page. The backing page should be pinned and NPT entry for it should be always mapped while the corresponding vCPU is running. MSR Accesses ------------ Secure AVIC only supports x2APIC MSR accesses. xAPIC MMIO offset based accesses are not supported. Some of the MSR writes such as ICR writes (with shorthand equal to self), SELF_IPI, EOI, TPR writes are accelerated by Secure AVIC hardware. Other MSR writes generate a #VC exception ( VMEXIT_AVIC_NOACCEL or VMEXIT_AVIC_INCOMPLETE_IPI). The #VC exception handler reads/writes to the guest APIC backing page. As guest APIC backing page is accessible to the guest, guest can optimize APIC register access by directly reading/writing to the guest APIC backing page (instead of taking the #VC exception route). APIC msr reads are accelerated similar to AVIC, as described in table "15-22. Guest vAPIC Register Access Behavior" of APM. In addition to the architected MSRs, following new fields are added to the guest APIC backing page which can be modified directly by the guest: a. ALLOWED_IRR ALLOWED_IRR vector indicates the interrupt vectors which the guest allows the hypervisor to send. The combination of host-controlled REQUESTED_IRR vectors (part of VMCB) and ALLOWED_IRR is used by hardware to update the IRR vectors of the Guest APIC backing page. #Offset #bits Description 204h 31:0 Guest allowed vectors 0-31 214h 31:0 Guest allowed vectors 32-63 ... 274h 31:0 Guest allowed vectors 224-255 ALLOWED_IRR is meant to be used specifically for vectors that the hypervisor emulates and is allowed to inject, such as IOAPIC/MSI device interrupts. Interrupt vectors used exclusively by the guest itself (like IPI vectors) should not be allowed to be injected into the guest for security reasons. b. NMI Request #Offset #bits Description 278h 0 Set by Guest to request Virtual NMI Guest can set NMI_REQUEST to trigger APIC_ICR based NMIs. APIC Registers -------------- 1. APIC ID APIC_ID values is set by KVM and similar to x2apic, it is equal to vcpu_id for a vCPU. 2. APIC LVR APIC Version register is expected to be read from KVM's APIC state using MSR_PROT rdmsr VMGEXIT and updated in guest APIC backing page. 3. APIC TPR TPR writes are accelerated and not communicated to KVM. So, hypervisor does not have information about TPR value for a vCPU. 4. APIC PPR Current state of PPR is not visible to KVM. 5. APIC SPIV Spurious Interrupt Vector register value is not communicated to KVM. 6. APIC IRR and ISR IRR and ISR states are visible only to guest. So, KVM cannot use these registers to determine interrupt which are pending completion. 7. APIC TMR Trigger Mode Register state is owned by guest and not visible to KVM. 8. Timer registers - TMICT, TMCCT, TDCR Timer registers are accessed using MSR_PROT VMGEXIT calls and not from the guest APIC backing page. 9. LVT* registers LVT registers state is accessed from KVM APIC state for the vCPU. Idle halt Intercept ------------------- As hypervisor does not have access to the APIC IRR state for a Secure AVIC guest, idle halt intercept feature should be always enabled for a Secure AVIC guest. Otherwise, any pending interrupts in APIC IRR during halt vmexit would not be serviced and vCPU could get stuck in halt forever. For idle halt intercept to work APIC TPR value should not block the pending interrupts. LAPIC Timer Support ------------------- LAPIC timer is emulated by KVM. So, APIC_LVTT, APIC_TMICT and APIC_TDCR, APIC_TMCCT APIC registers are not read/written to the guest APIC backing page and are communicated to the hypervisor using MSR_PROT VMGEXIT. IPI Support ----------- Only SELF_IPI is accelerated by Secure AVIC hardware. Other IPI destination shorthands result in VMEXIT_AVIC_INCOMPLETE_IPI #VC exception. The expected guest handling for VMEXIT_AVIC_INCOMPLETE_IPI is: - For interrupts, update APIC_IRR in target vCPUs' guest APIC backing page. - For NMIs, update NMI_REQUEST in target vCPUs' guest backing page. - ICR based SMI, INIT, SIPI requests are not supported. - After updating the target vCPU's guest APIC backing page, source vCPU does a MSR_PROT VMGEXIT. - KVM either wakes up the non-running target vCPU or sends a AVIC doorbell. Exceptions Injection -------------------- Secure AVIC does not support event injection for guests with Secure AVIC enabled in SEV_FEATURES. So, KVM cannot inject exceptions to Secure AVIC guests. Hardware takes care of reinjecting an interrupted exception (for example due to NPF) raised in guest on next VMRUN. VC exception is not reinjected. KVM clears all exception intercepts for Secure AVIC guest. Interrupt Injection ------------------- IOAPIC and MSI based device interrupts can be injected by KVM. The interrupt flow for this is: - IOAPIC/MSI interrupts are updated in KVM's APIC_IRR state via kvm_irq_delivery_to_apic(). - in ->inject_irq() callback, all interrupts which are set in KVM's APIC_IRR are copied to RequestedIRR VMCB field and UpdateIRR bit is set. - VMENTER moves the current value of RequestedIRR to APIC_IRR in guest APIC backing page and clears UpdateIRR. Given that hardware clearing of RequestedIRR and UpdateIRR can race with software writes to these fields, above interrupt injection flow ensures that all RequestedIRR and UpdateIRR writes are done from the same CPU where vCPU is run. As interrupt delivery to vCPU is managed by hardware, interrupt window is not applicable for Secure AVIC guests and interrupts are always allowed to be injected. PIC interrupts -------------- Legacy PIC interrupts cannot be injected as they required event_inj or VINTR injection support. Both of these are cannot be done for Secure AVIC guest. PIT --- PIT Reinject mode is not supported as it requires IRQ ack notification on EOI. As EOI is accelerated for edge interrupts, IRQ ack notification is not called for those interrupts. NMI Injection ------------- NMI injection requires ALLOWED_NMI to be set in Secure AVIC control msr by the guest. Only VNMI injection is allowed. Open Points ----------- - RTC_GSI requires pending EOI information to detect coalesced interrupts. As RTC_GSI is edge triggered, Secure AVIC does not forward EOI write to KVM for this interrupt. In addition, APIC_IRR and APIC_ISR states are not visible to KVM and are part of guest APIC backing page. Approach taken in this series is to disable checking of coalesced RTC_GSI interrupts for Secure AVIC, which could impact userspace. - EOI handling for level interrupts uses KVM's unused APIC_ISR regs for tracking pending level interrupts. KVM uses its APIC_TMR state to determine level-triggered interrupts. As KVM's APIC_TMR is updated from IOAPIC redirect tables, the TMR information should be accurate and match guest APIC state. This can be cleaned up later to not use KVM's APIC_ISR state and maintained within sev code. - Spurious Interrupt Vector Register writes are not visible to KVM. So, KVM cannot determine if the SW enabled bit is set. - As exceptions cannot be injected by KVM, a more detailed examination of which intercepts need to be allowed for Secure AVIC guests is required. - As KVM does not have access to the guest's APIC_IRR and APIC_ISR states, kvm_apic_pending_eoi() does not return correct information. - External interrupts (PIC) are not supported. This breaks KVM's PIC emulation. - PIT reinject mode is not supported. - Current code uses KVM's vCPU APIC_IRR for interrupts which need to be injected to guest. Another approach could be to maintain pending interrupts within sev code and inject using flow similar to posted interrupts. This series is based on top of commit f7bafceba76e ("KVM: remove kvm_arch_post_init_vm ") and is based on git.kernel.org/pub/scm/virt/kvm/kvm.git next Git tree is available at: https://github.com/AMDESE/linux-kvm/tree/savic-host-latest Qemu tree is at: https://github.com/AMDESE/qemu/tree/secure-avic Guest Secure AVIC support is available at: https://lore.kernel.org/lkml/20250226090525.231882-1-Neeraj.Upadhyay@xxxxxxx/ This series depends on below patch series: 1. Idle Halt Intercept https://lore.kernel.org/all/20250128124812.7324-1-manali.shukla@xxxxxxx/ 2. ALLOWED_SEV_FEATURES support https://lore.kernel.org/kvm/20250207233410.130813-1-kim.phillips@xxxxxxx/ Kishon Vijay Abraham I (5): KVM: SEV: Do not intercept SECURE_AVIC_CONTROL MSR KVM: SVM: Secure AVIC: Do not inject "Exceptions" for Secure AVIC KVM: SVM/SEV: Secure AVIC: Set VGIF in VMSA area KVM: SVM/SEV: Secure AVIC: Enable NMI support KVM: x86: Secure AVIC: Indicate APIC is enabled by guest SW _always_ Neeraj Upadhyay (12): KVM: x86: Convert guest_apic_protected bool to an enum type x86/cpufeatures: Add Secure AVIC CPU Feature KVM: SVM: Add support for Secure AVIC capability in KVM KVM: SVM: Initialize apic protected state for SAVIC guests KVM: SVM/SEV/X86: Secure AVIC: Add support to inject interrupts KVM: SVM/SEV/X86: Secure AVIC: Add hypervisor side IPI Delivery Support KVM: SVM/SEV: Do not intercept exceptions for Secure AVIC guest KVM: SVM/SEV: Add SVM_VMGEXIT_SECURE_AVIC GHCB protocol event handling KVM: x86: Secure AVIC: Add IOAPIC EOI support for level interrupts KVM: x86/ioapic: Disable RTC_GSI EOI tracking for protected APIC X86: SVM: Check injected vectors before waiting for timer expiry KVM: SVM/SEV: Allow creating VMs with Secure AVIC enabled Sean Christopherson (2): KVM: TDX: Add support for find pending IRQ in a protected local APIC KVM: x86: Assume timer IRQ was injected if APIC state is protected arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/kvm-x86-ops.h | 1 + arch/x86/include/asm/kvm_host.h | 1 + arch/x86/include/asm/msr-index.h | 2 + arch/x86/include/asm/svm.h | 9 +- arch/x86/include/uapi/asm/svm.h | 3 + arch/x86/kvm/ioapic.c | 8 +- arch/x86/kvm/irq.c | 6 + arch/x86/kvm/lapic.c | 23 +- arch/x86/kvm/lapic.h | 16 ++ arch/x86/kvm/svm/sev.c | 371 +++++++++++++++++++++++++++++ arch/x86/kvm/svm/svm.c | 79 ++++-- arch/x86/kvm/svm/svm.h | 17 +- arch/x86/kvm/x86.c | 12 +- 14 files changed, 518 insertions(+), 31 deletions(-) base-commit: f7bafceba76e9ab475b413578c1757ee18c3e44b -- 2.34.1