Paolo and others, please take a look at the proposed changes to force_emulation_prefix, especially the last patch which makes it writable while KVM is running. I'm pretty confident that's safe, but definitely want confirmation that I'm not overlooking something. The main goal of this series is to fix KVM's longstanding bug of not honoring L1's exception intercepts wants when handling an exception that occurs during delivery of a different exception. E.g. if L0 and L1 are using shadow paging, and L2 hits a #PF, and then hits another #PF while vectoring the first #PF due to _L1_ not having a shadow page for the IDT, KVM needs to check L1's intercepts before morphing the #PF => #PF => #DF so that the #PF is routed to L1, not injected into L2 as a #DF. nVMX has hacked around the bug for years by overriding the #PF injector for shadow paging to go straight to VM-Exit, and nSVM has started doing the same. The hacks mostly work, but they're incomplete, confusing, and lead to other hacky code, e.g. bailing from the emulator because #PF injection forced a VM-Exit and suddenly KVM is back in L1. v5: - Collect reviews. [Maxim] - Check code breakpoints on forced emulation (FEP #UD). Hardware checks the RIP of the magic prefix, KVM needs to check the RIP of the insn. - Suppress code #DBs on Intel vCPUs if MOV/POP-SS blocking is active. - Extend the forced emulation to allow clearing RFLAGS.RF (Intel sets it unconditionaly on VM-Exit(#UD), which makes it impossibled to use forced emulation to test code #DBs. - Allow writing force_emulation_prefix while KVM is running. v4: - https://lore.kernel.org/all/20220723005137.1649592-1-seanjc@xxxxxxxxxx - Collect reviews. [Maxim] - Fix a bug where an intermediate patch dropped the async #PF token and used a stale payload. [Maxim] - Tweak comments to call out that AMD CPUs generate error codes with bits 31:16 != 0. [Maxim] v3: - https://lore.kernel.org/all/20220715204226.3655170-1-seanjc@xxxxxxxxxx - Collect reviews. [Maxim, Jim] - Split a few patches into more consumable chunks. [Maxim] - Document that KVM doesn't correctly handle SMI+MTF (or SMI priority). [Maxim] - Add comment to document the instruction boundary (event window) aspect of block_nested_events. [Maxim] - Add a patch to rename inject_pending_events() and add a comment to document KVM's not-quite-architecturally-correct handing of instruction boundaries and asynchronous events. [Maxim] v2: - https://lore.kernel.org/all/20220614204730.3359543-1-seanjc@xxxxxxxxxx - Rebased to kvm/queue (commit 8baacf67c76c) + selftests CPUID overhaul. https://lore.kernel.org/all/20220614200707.3315957-1-seanjc@xxxxxxxxxx - Treat KVM_REQ_TRIPLE_FAULT as a pending exception. v1: https://lore.kernel.org/all/20220614204730.3359543-1-seanjc@xxxxxxxxxx Sean Christopherson (27): KVM: nVMX: Unconditionally purge queued/injected events on nested "exit" KVM: VMX: Drop bits 31:16 when shoving exception error code into VMCS KVM: x86: Don't check for code breakpoints when emulating on exception KVM: x86: Allow clearing RFLAGS.RF on forced emulation to test code #DBs KVM: x86: Suppress code #DBs on Intel if MOV/POP SS blocking is active KVM: nVMX: Treat General Detect #DB (DR7.GD=1) as fault-like KVM: nVMX: Prioritize TSS T-flag #DBs over Monitor Trap Flag KVM: x86: Treat #DBs from the emulator as fault-like (code and DR7.GD=1) KVM: x86: Use DR7_GD macro instead of open coding check in emulator KVM: nVMX: Ignore SIPI that arrives in L2 when vCPU is not in WFS KVM: nVMX: Unconditionally clear mtf_pending on nested VM-Exit KVM: VMX: Inject #PF on ENCLS as "emulated" #PF KVM: x86: Rename kvm_x86_ops.queue_exception to inject_exception KVM: x86: Make kvm_queued_exception a properly named, visible struct KVM: x86: Formalize blocking of nested pending exceptions KVM: x86: Use kvm_queue_exception_e() to queue #DF KVM: x86: Hoist nested event checks above event injection logic KVM: x86: Evaluate ability to inject SMI/NMI/IRQ after potential VM-Exit KVM: nVMX: Add a helper to identify low-priority #DB traps KVM: nVMX: Document priority of all known events on Intel CPUs KVM: x86: Morph pending exceptions to pending VM-Exits at queue time KVM: x86: Treat pending TRIPLE_FAULT requests as pending exceptions KVM: VMX: Update MTF and ICEBP comments to document KVM's subtle behavior KVM: x86: Rename inject_pending_events() to kvm_check_and_inject_events() KVM: selftests: Use uapi header to get VMX and SVM exit reasons/codes KVM: selftests: Add an x86-only test to verify nested exception queueing KVM: x86: Allow force_emulation_prefix to be written without a reload arch/x86/include/asm/kvm-x86-ops.h | 2 +- arch/x86/include/asm/kvm_host.h | 35 +- arch/x86/kvm/emulate.c | 3 +- arch/x86/kvm/svm/nested.c | 110 ++-- arch/x86/kvm/svm/svm.c | 20 +- arch/x86/kvm/vmx/nested.c | 331 ++++++++---- arch/x86/kvm/vmx/sgx.c | 2 +- arch/x86/kvm/vmx/vmx.c | 54 +- arch/x86/kvm/x86.c | 488 ++++++++++++------ arch/x86/kvm/x86.h | 11 +- tools/testing/selftests/kvm/.gitignore | 1 + tools/testing/selftests/kvm/Makefile | 1 + .../selftests/kvm/include/x86_64/svm_util.h | 7 +- .../selftests/kvm/include/x86_64/vmx.h | 51 +- .../kvm/x86_64/nested_exceptions_test.c | 295 +++++++++++ 15 files changed, 987 insertions(+), 424 deletions(-) create mode 100644 tools/testing/selftests/kvm/x86_64/nested_exceptions_test.c base-commit: 372d07084593dc7a399bf9bee815711b1fb1bcf2 -- 2.37.2.672.g94769d06f0-goog