Linus, The following changes since commit da3ea35007d0af457a0afc87e84fddaebc4e0b63: Linux 6.11-rc7 (2024-09-08 14:50:28 -0700) are available in the Git repository at: https://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/for-linus for you to fetch changes up to efbc6bd090f48ccf64f7a8dd5daea775821d57ec: Documentation: KVM: fix warning in "make htmldocs" (2024-09-27 11:45:50 -0400) Apologize for the late pull request; all the traveling made things a bit messy. Also, we have a known regression here on ancient processors and will fix it next week. Paolo ---------------------------------------------------------------- x86: * KVM currently invalidates the entirety of the page tables, not just those for the memslot being touched, when a memslot is moved or deleted. The former does not have particularly noticeable overhead, but Intel's TDX will require the guest to re-accept private pages if they are dropped from the secure EPT, which is a non starter. Actually, the only reason why this is not already being done is a bug which was never fully investigated and caused VM instability with assigned GeForce GPUs, so allow userspace to opt into the new behavior. * Advertise AVX10.1 to userspace (effectively prep work for the "real" AVX10 functionality that is on the horizon). * Rework common MSR handling code to suppress errors on userspace accesses to unsupported-but-advertised MSRs. This will allow removing (almost?) all of KVM's exemptions for userspace access to MSRs that shouldn't exist based on the vCPU model (the actual cleanup is non-trivial future work). * Rework KVM's handling of x2APIC ICR, again, because AMD (x2AVIC) splits the 64-bit value into the legacy ICR and ICR2 storage, whereas Intel (APICv) stores the entire 64-bit value at the ICR offset. * Fix a bug where KVM would fail to exit to userspace if one was triggered by a fastpath exit handler. * Add fastpath handling of HLT VM-Exit to expedite re-entering the guest when there's already a pending wake event at the time of the exit. * Fix a WARN caused by RSM entering a nested guest from SMM with invalid guest state, by forcing the vCPU out of guest mode prior to signalling SHUTDOWN (the SHUTDOWN hits the VM altogether, not the nested guest) * Overhaul the "unprotect and retry" logic to more precisely identify cases where retrying is actually helpful, and to harden all retry paths against putting the guest into an infinite retry loop. * Add support for yielding, e.g. to honor NEED_RESCHED, when zapping rmaps in the shadow MMU. * Refactor pieces of the shadow MMU related to aging SPTEs in prepartion for adding multi generation LRU support in KVM. * Don't stuff the RSB after VM-Exit when RETPOLINE=y and AutoIBRS is enabled, i.e. when the CPU has already flushed the RSB. * Trace the per-CPU host save area as a VMCB pointer to improve readability and cleanup the retrieval of the SEV-ES host save area. * Remove unnecessary accounting of temporary nested VMCB related allocations. * Set FINAL/PAGE in the page fault error code for EPT violations if and only if the GVA is valid. If the GVA is NOT valid, there is no guest-side page table walk and so stuffing paging related metadata is nonsensical. * Fix a bug where KVM would incorrectly synthesize a nested VM-Exit instead of emulating posted interrupt delivery to L2. * Add a lockdep assertion to detect unsafe accesses of vmcs12 structures. * Harden eVMCS loading against an impossible NULL pointer deref (really truly should be impossible). * Minor SGX fix and a cleanup. * Misc cleanups Generic: * Register KVM's cpuhp and syscore callbacks when enabling virtualization in hardware, as the sole purpose of said callbacks is to disable and re-enable virtualization as needed. * Enable virtualization when KVM is loaded, not right before the first VM is created. Together with the previous change, this simplifies a lot the logic of the callbacks, because their very existence implies virtualization is enabled. * Fix a bug that results in KVM prematurely exiting to userspace for coalesced MMIO/PIO in many cases, clean up the related code, and add a testcase. * Fix a bug in kvm_clear_guest() where it would trigger a buffer overflow _if_ the gpa+len crosses a page boundary, which thankfully is guaranteed to not happen in the current code base. Add WARNs in more helpers that read/write guest memory to detect similar bugs. Selftests: * Fix a goof that caused some Hyper-V tests to be skipped when run on bare metal, i.e. NOT in a VM. * Add a regression test for KVM's handling of SHUTDOWN for an SEV-ES guest. * Explicitly include one-off assets in .gitignore. Past Sean was completely wrong about not being able to detect missing .gitignore entries. * Verify userspace single-stepping works when KVM happens to handle a VM-Exit in its fastpath. * Misc cleanups ---------------------------------------------------------------- Amit Shah (1): KVM: SVM: let alternatives handle the cases when RSB filling is required Christoph Schlameuss (7): selftests: kvm: s390: Define page sizes in shared header selftests: kvm: s390: Add kvm_s390_sie_block definition for userspace tests selftests: kvm: s390: Add s390x ucontrol test suite with hpage test selftests: kvm: s390: Add test fixture and simple VM setup tests selftests: kvm: s390: Add debug print functions selftests: kvm: s390: Add VM run test case s390: Enable KVM_S390_UCONTROL config in debug_defconfig Hariharan Mari (1): KVM: s390: Fix SORTL and DFLTCC instruction format error in __insn32_query Ilias Stamatis (1): KVM: Fix coalesced_mmio_has_room() to avoid premature userspace exit Kai Huang (2): KVM: VMX: Do not account for temporary memory allocation in ECREATE emulation KVM: VMX: Also clear SGX EDECCSSA in KVM CPU caps when SGX is disabled Li Chen (1): KVM: x86: Use this_cpu_ptr() in kvm_user_return_msr_cpu_online Maxim Levitsky (1): KVM: nVMX: Use vmx_segment_cache_clear() instead of open coded equivalent Paolo Bonzini (12): Merge tag 'kvm-s390-next-6.12-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD Merge branch 'kvm-memslot-zap-quirk' into HEAD Merge branch 'kvm-redo-enable-virt' into HEAD Merge tag 'kvm-x86-generic-6.12' of https://github.com/kvm-x86/linux into HEAD Merge tag 'kvm-x86-misc-6.12' of https://github.com/kvm-x86/linux into HEAD Merge tag 'kvm-x86-selftests-6.12' of https://github.com/kvm-x86/linux into HEAD Merge tag 'kvm-x86-mmu-6.12' of https://github.com/kvm-x86/linux into HEAD Merge tag 'kvm-x86-pat_vmx_msrs-6.12' of https://github.com/kvm-x86/linux into HEAD Merge tag 'kvm-x86-svm-6.12' of https://github.com/kvm-x86/linux into HEAD Merge tag 'kvm-x86-vmx-6.12' of https://github.com/kvm-x86/linux into HEAD Documentation: KVM: fix warning in "make htmldocs" Merge remote-tracking branch 'origin/master' into HEAD Peter Gonda (1): KVM: selftests: Add SEV-ES shutdown test Qiang Liu (1): KVM: VMX: Modify the BUILD_BUG_ON_MSG of the 32-bit field in the vmcs_check16 function Sean Christopherson (94): x86/cpu: KVM: Add common defines for architectural memory types (PAT, MTRRs, etc.) x86/cpu: KVM: Move macro to encode PAT value to common header KVM: x86: Stuff vCPU's PAT with default value at RESET, not creation KVM: nVMX: Add a helper to encode VMCS info in MSR_IA32_VMX_BASIC KVM VMX: Move MSR_IA32_VMX_MISC bit defines to asm/vmx.h KVM: nVMX: Honor userspace MSR filter lists for nested VM-Enter/VM-Exit KVM: x86/mmu: Clean up function comments for dirty logging APIs KVM: SVM: Disallow guest from changing userspace's MSR_AMD64_DE_CFG value KVM: x86: Move MSR_TYPE_{R,W,RW} values from VMX to x86, as enums KVM: x86: Rename KVM_MSR_RET_INVALID to KVM_MSR_RET_UNSUPPORTED KVM: x86: Refactor kvm_x86_ops.get_msr_feature() to avoid kvm_msr_entry KVM: x86: Rename get_msr_feature() APIs to get_feature_msr() KVM: x86: Refactor kvm_get_feature_msr() to avoid struct kvm_msr_entry KVM: x86: Funnel all fancy MSR return value handling into a common helper KVM: x86: Hoist x86.c's global msr_* variables up above kvm_do_msr_access() KVM: x86: Suppress failures on userspace access to advertised, unsupported MSRs KVM: x86: Suppress userspace access failures on unsupported, "emulated" MSRs KVM: x86: Enforce x2APIC's must-be-zero reserved ICR bits KVM: x86: Move x2APIC ICR helper above kvm_apic_write_nodecode() KVM: x86: Re-split x2APIC ICR into ICR+ICR2 for AMD (x2AVIC) KVM: selftests: Open code vcpu_run() equivalent in guest_printf test KVM: selftests: Report unhandled exceptions on x86 as regular guest asserts KVM: selftests: Add x86 helpers to play nice with x2APIC MSR #GPs KVM: selftests: Skip ICR.BUSY test in xapic_state_test if x2APIC is enabled KVM: selftests: Test x2APIC ICR reserved bits KVM: selftests: Verify the guest can read back the x2APIC ICR it wrote KVM: selftests: Play nice with AMD's AVIC errata KVM: selftests: Remove unused kvm_memcmp_hva_gva() KVM: selftests: Always unlink memory regions when deleting (VM free) KVM: x86/mmu: Decrease indentation in logic to sync new indirect shadow page KVM: x86/mmu: Drop pointless "return" wrapper label in FNAME(fetch) KVM: x86/mmu: Reword a misleading comment about checking gpte_changed() KVM: SVM: Add a helper to convert a SME-aware PA back to a struct page KVM: SVM: Add host SEV-ES save area structure into VMCB via a union KVM: SVM: Track the per-CPU host save area as a VMCB pointer KVM: selftests: Add a test for coalesced MMIO (and PIO on x86) KVM: Clean up coalesced MMIO ring full check KVM: selftests: Explicitly include committed one-off assets in .gitignore KVM: x86: Re-enter guest if WRMSR(X2APIC_ICR) fastpath is successful KVM: x86: Dedup fastpath MSR post-handling logic KVM: x86: Exit to userspace if fastpath triggers one on instruction skip KVM: x86: Reorganize code in x86.c to co-locate vCPU blocking/running helpers KVM: x86: Add fastpath handling of HLT VM-Exits KVM: Use dedicated mutex to protect kvm_usage_count to avoid deadlock KVM: Register cpuhp and syscore callbacks when enabling hardware KVM: Rename symbols related to enabling virtualization hardware KVM: Rename arch hooks related to per-CPU virtualization enabling KVM: MIPS: Rename virtualization {en,dis}abling APIs to match common KVM KVM: x86: Rename virtualization {en,dis}abling APIs to match common KVM KVM: Add a module param to allow enabling virtualization when KVM is loaded KVM: Add arch hooks for enabling/disabling virtualization x86/reboot: Unconditionally define cpu_emergency_virt_cb typedef KVM: x86: Register "emergency disable" callbacks when virt is enabled KVM: x86: Forcibly leave nested if RSM to L2 hits shutdown KVM: selftests: Verify single-stepping a fastpath VM-Exit exits to userspace KVM: x86: Move "ack" phase of local APIC IRQ delivery to separate API KVM: nVMX: Get to-be-acknowledge IRQ for nested VM-Exit at injection site KVM: nVMX: Suppress external interrupt VM-Exit injection if there's no IRQ KVM: nVMX: Detect nested posted interrupt NV at nested VM-Exit injection KVM: x86: Fold kvm_get_apic_interrupt() into kvm_cpu_get_interrupt() KVM: nVMX: Explicitly invalidate posted_intr_nv if PI is disabled at VM-Enter KVM: nVMX: Assert that vcpu->mutex is held when accessing secondary VMCSes KVM: Write the per-page "segment" when clearing (part of) a guest page KVM: Harden guest memory APIs against out-of-bounds accesses KVM: x86/mmu: Replace PFERR_NESTED_GUEST_PAGE with a more descriptive helper KVM: x86/mmu: Trigger unprotect logic only on write-protection page faults KVM: x86/mmu: Skip emulation on page fault iff 1+ SPs were unprotected KVM: x86: Retry to-be-emulated insn in "slow" unprotect path iff sp is zapped KVM: x86: Get RIP from vCPU state when storing it to last_retry_eip KVM: x86: Store gpa as gpa_t, not unsigned long, when unprotecting for retry KVM: x86/mmu: Apply retry protection to "fast nTDP unprotect" path KVM: x86/mmu: Try "unprotect for retry" iff there are indirect SPs KVM: x86: Move EMULTYPE_ALLOW_RETRY_PF to x86_emulate_instruction() KVM: x86: Fold retry_instruction() into x86_emulate_instruction() KVM: x86/mmu: Don't try to unprotect an INVALID_GPA KVM: x86/mmu: Always walk guest PTEs with WRITE access when unprotecting KVM: x86/mmu: Move event re-injection unprotect+retry into common path KVM: x86: Remove manual pfn lookup when retrying #PF after failed emulation KVM: x86: Check EMULTYPE_WRITE_PF_TO_SP before unprotecting gfn KVM: x86: Apply retry protection to "unprotect on failure" path KVM: x86: Update retry protection fields when forcing retry on emulation failure KVM: x86: Rename reexecute_instruction()=>kvm_unprotect_and_retry_on_failure() KVM: x86/mmu: Subsume kvm_mmu_unprotect_page() into the and_retry() version KVM: x86/mmu: Detect if unprotect will do anything based on invalid_list KVM: x86/mmu: WARN on MMIO cache hit when emulating write-protected gfn KVM: x86/mmu: Move walk_slot_rmaps() up near for_each_slot_rmap_range() KVM: x86/mmu: Plumb a @can_yield parameter into __walk_slot_rmaps() KVM: x86/mmu: Add a helper to walk and zap rmaps for a memslot KVM: x86/mmu: Honor NEED_RESCHED when zapping rmaps and blocking is allowed KVM: x86/mmu: Morph kvm_handle_gfn_range() into an aging specific helper KVM: x86/mmu: Fold mmu_spte_age() into kvm_rmap_age_gfn_range() KVM: x86/mmu: Add KVM_RMAP_MANY to replace open coded '1' and '1ul' literals KVM: x86/mmu: Use KVM_PAGES_PER_HPAGE() instead of an open coded equivalent KVM: VMX: Set PFERR_GUEST_{FINAL,PAGE}_MASK if and only if the GVA is valid Tao Su (1): KVM: x86: Advertise AVX10.1 CPUID to userspace Thorsten Blum (1): KVM: x86: Optimize local variable in start_sw_tscdeadline() Vitaly Kuznetsov (3): KVM: VMX: hyper-v: Prevent impossible NULL pointer dereference in evmcs_load() KVM: selftests: Move Hyper-V specific functions out of processor.c KVM: selftests: Re-enable hyperv_evmcs/hyperv_svm_test on bare metal Xin Li (5): KVM: VMX: Move MSR_IA32_VMX_BASIC bit defines to asm/vmx.h KVM: VMX: Track CPU's MSR_IA32_VMX_BASIC as a single 64-bit value KVM: nVMX: Use macros and #defines in vmx_restore_vmx_basic() KVM: VMX: Open code VMX preemption timer rate mask in its accessor KVM: nVMX: Use macros and #defines in vmx_restore_vmx_misc() Yan Zhao (4): KVM: x86/mmu: Introduce a quirk to control memslot zap behavior KVM: selftests: Test slot move/delete with slot zap quirk enabled/disabled KVM: selftests: Allow slot modification stress test with quirk disabled KVM: selftests: Test memslot move in memslot_perf_test with quirk disabled Yongqiang Liu (1): KVM: SVM: Remove unnecessary GFP_KERNEL_ACCOUNT in svm_set_nested_state() Yue Haibing (1): KVM: x86: Remove some unused declarations Documentation/admin-guide/kernel-parameters.txt | 17 + Documentation/virt/kvm/api.rst | 31 +- Documentation/virt/kvm/locking.rst | 32 +- arch/arm64/kvm/arm.c | 6 +- arch/loongarch/kvm/main.c | 4 +- arch/mips/include/asm/kvm_host.h | 4 +- arch/mips/kvm/mips.c | 8 +- arch/mips/kvm/vz.c | 8 +- arch/riscv/kvm/main.c | 4 +- arch/s390/configs/debug_defconfig | 1 + arch/s390/kvm/kvm-s390.c | 27 +- arch/x86/include/asm/cpuid.h | 1 + arch/x86/include/asm/kvm-x86-ops.h | 6 +- arch/x86/include/asm/kvm_host.h | 32 +- arch/x86/include/asm/msr-index.h | 34 +- arch/x86/include/asm/reboot.h | 2 +- arch/x86/include/asm/svm.h | 20 +- arch/x86/include/asm/vmx.h | 40 +- arch/x86/include/uapi/asm/kvm.h | 1 + arch/x86/kernel/cpu/mtrr/mtrr.c | 6 + arch/x86/kvm/cpuid.c | 30 +- arch/x86/kvm/irq.c | 10 +- arch/x86/kvm/lapic.c | 84 +- arch/x86/kvm/lapic.h | 3 +- arch/x86/kvm/mmu.h | 2 - arch/x86/kvm/mmu/mmu.c | 558 ++++++----- arch/x86/kvm/mmu/mmu_internal.h | 5 +- arch/x86/kvm/mmu/mmutrace.h | 1 + arch/x86/kvm/mmu/paging_tmpl.h | 63 +- arch/x86/kvm/mmu/tdp_mmu.c | 6 +- arch/x86/kvm/reverse_cpuid.h | 8 + arch/x86/kvm/smm.c | 24 +- arch/x86/kvm/svm/nested.c | 4 +- arch/x86/kvm/svm/svm.c | 87 +- arch/x86/kvm/svm/svm.h | 18 +- arch/x86/kvm/svm/vmenter.S | 8 +- arch/x86/kvm/vmx/capabilities.h | 10 +- arch/x86/kvm/vmx/main.c | 10 +- arch/x86/kvm/vmx/nested.c | 134 ++- arch/x86/kvm/vmx/nested.h | 8 +- arch/x86/kvm/vmx/sgx.c | 2 +- arch/x86/kvm/vmx/vmx.c | 67 +- arch/x86/kvm/vmx/vmx.h | 9 +- arch/x86/kvm/vmx/vmx_onhyperv.h | 8 + arch/x86/kvm/vmx/vmx_ops.h | 2 +- arch/x86/kvm/vmx/x86_ops.h | 7 +- arch/x86/kvm/x86.c | 1006 ++++++++++---------- arch/x86/kvm/x86.h | 31 +- arch/x86/mm/pat/memtype.c | 36 +- include/linux/kvm_host.h | 18 +- tools/testing/selftests/kvm/.gitignore | 4 + tools/testing/selftests/kvm/Makefile | 4 + tools/testing/selftests/kvm/coalesced_io_test.c | 236 +++++ tools/testing/selftests/kvm/guest_print_test.c | 19 +- tools/testing/selftests/kvm/include/kvm_util.h | 28 +- .../selftests/kvm/include/s390x/debug_print.h | 69 ++ .../selftests/kvm/include/s390x/processor.h | 5 + tools/testing/selftests/kvm/include/s390x/sie.h | 240 +++++ tools/testing/selftests/kvm/include/x86_64/apic.h | 23 +- .../testing/selftests/kvm/include/x86_64/hyperv.h | 18 + .../selftests/kvm/include/x86_64/processor.h | 7 +- tools/testing/selftests/kvm/lib/kvm_util.c | 85 +- tools/testing/selftests/kvm/lib/s390x/processor.c | 10 +- tools/testing/selftests/kvm/lib/x86_64/hyperv.c | 67 ++ tools/testing/selftests/kvm/lib/x86_64/processor.c | 69 +- .../kvm/memslot_modification_stress_test.c | 19 +- tools/testing/selftests/kvm/memslot_perf_test.c | 12 +- tools/testing/selftests/kvm/s390x/cmma_test.c | 7 +- tools/testing/selftests/kvm/s390x/config | 2 + tools/testing/selftests/kvm/s390x/debug_test.c | 4 +- tools/testing/selftests/kvm/s390x/memop.c | 4 +- tools/testing/selftests/kvm/s390x/tprot.c | 5 +- tools/testing/selftests/kvm/s390x/ucontrol_test.c | 332 +++++++ .../testing/selftests/kvm/set_memory_region_test.c | 29 +- tools/testing/selftests/kvm/x86_64/debug_regs.c | 11 +- tools/testing/selftests/kvm/x86_64/hyperv_evmcs.c | 2 +- .../testing/selftests/kvm/x86_64/hyperv_svm_test.c | 2 +- .../testing/selftests/kvm/x86_64/sev_smoke_test.c | 32 + .../selftests/kvm/x86_64/xapic_state_test.c | 54 +- .../testing/selftests/kvm/x86_64/xen_vmcall_test.c | 1 + virt/kvm/coalesced_mmio.c | 31 +- virt/kvm/kvm_main.c | 281 +++--- 82 files changed, 2803 insertions(+), 1452 deletions(-)