The main goal of this series is to fix KVM's longstanding bug of not honoring L1's exception intercepts wants when handling an exception that occurs during delivery of a different exception. E.g. if L0 and L1 are using shadow paging, and L2 hits a #PF, and then hits another #PF while vectoring the first #PF due to _L1_ not having a shadow page for the IDT, KVM needs to check L1's intercepts before morphing the #PF => #PF => #DF so that the #PF is routed to L1, not injected into L2 as a #DF. nVMX has hacked around the bug for years by overriding the #PF injector for shadow paging to go straight to VM-Exit, and nSVM has started doing the same. The hacks mostly work, but they're incomplete, confusing, and lead to other hacky code, e.g. bailing from the emulator because #PF injection forced a VM-Exit and suddenly KVM is back in L1. Maxim, I believe I addressed all of your comments, holler if I missed something. v2: - Collect reviews. [Maxim, Jim] - Split a few patches into more consumable chunks. [Maxim] - Document that KVM doesn't correctly handle SMI+MTF (or SMI priority). [Maxim] - Add comment to document the instruction boundary (event window) aspect of block_nested_events. [Maxim] - Add a patch to rename inject_pending_events() and add a comment to document KVM's not-quite-architecturally-correct handing of instruction boundaries and asynchronous events. [Maxim] v1: https://lore.kernel.org/all/20220614204730.3359543-1-seanjc@xxxxxxxxxx Sean Christopherson (24): KVM: nVMX: Unconditionally purge queued/injected events on nested "exit" KVM: VMX: Drop bits 31:16 when shoving exception error code into VMCS KVM: x86: Don't check for code breakpoints when emulating on exception KVM: nVMX: Treat General Detect #DB (DR7.GD=1) as fault-like KVM: nVMX: Prioritize TSS T-flag #DBs over Monitor Trap Flag KVM: x86: Treat #DBs from the emulator as fault-like (code and DR7.GD=1) KVM: x86: Use DR7_GD macro instead of open coding check in emulator KVM: nVMX: Ignore SIPI that arrives in L2 when vCPU is not in WFS KVM: nVMX: Unconditionally clear mtf_pending on nested VM-Exit KVM: VMX: Inject #PF on ENCLS as "emulated" #PF KVM: x86: Rename kvm_x86_ops.queue_exception to inject_exception KVM: x86: Make kvm_queued_exception a properly named, visible struct KVM: x86: Formalize blocking of nested pending exceptions KVM: x86: Use kvm_queue_exception_e() to queue #DF KVM: x86: Hoist nested event checks above event injection logic KVM: x86: Evaluate ability to inject SMI/NMI/IRQ after potential VM-Exit KVM: nVMX: Add a helper to identify low-priority #DB traps KVM: nVMX: Document priority of all known events on Intel CPUs KVM: x86: Morph pending exceptions to pending VM-Exits at queue time KVM: x86: Treat pending TRIPLE_FAULT requests as pending exceptions KVM: VMX: Update MTF and ICEBP comments to document KVM's subtle behavior KVM: x86: Rename inject_pending_events() to kvm_check_and_inject_events() KVM: selftests: Use uapi header to get VMX and SVM exit reasons/codes KVM: selftests: Add an x86-only test to verify nested exception queueing arch/x86/include/asm/kvm-x86-ops.h | 2 +- arch/x86/include/asm/kvm_host.h | 35 +- arch/x86/kvm/emulate.c | 3 +- arch/x86/kvm/svm/nested.c | 110 ++--- arch/x86/kvm/svm/svm.c | 20 +- arch/x86/kvm/vmx/nested.c | 329 ++++++++----- arch/x86/kvm/vmx/sgx.c | 2 +- arch/x86/kvm/vmx/vmx.c | 53 ++- arch/x86/kvm/x86.c | 450 ++++++++++++------ arch/x86/kvm/x86.h | 11 +- tools/testing/selftests/kvm/.gitignore | 1 + tools/testing/selftests/kvm/Makefile | 1 + .../selftests/kvm/include/x86_64/svm_util.h | 7 +- .../selftests/kvm/include/x86_64/vmx.h | 51 +- .../kvm/x86_64/nested_exceptions_test.c | 295 ++++++++++++ 15 files changed, 950 insertions(+), 420 deletions(-) create mode 100644 tools/testing/selftests/kvm/x86_64/nested_exceptions_test.c base-commit: 8031d87aa9953ddeb047a5356ebd0b240c30f233 -- 2.37.0.170.g444d1eabd0-goog