On Mon, Feb 21, 2022, Paolo Bonzini wrote: > Currently, vendor code is patching the inject_page_fault and later, on > vmexit, expecting kvm_init_mmu to restore the inject_page_fault callback. > > This is brittle, as exposed by the fact that SVM KVM_SET_NESTED_STATE > forgets to do it. Instead, do the check at the time a page fault actually > has to be injected. This does incur the cost of an extra retpoline > for nested vmexits when TDP is disabled, but is overall much cleaner. > While at it, add a comment that explains why the different behavior > is needed in this case. > > Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx> > --- If I have NAK powers, NAK NAK NAK NAK NAK :-) Forcing a VM-Exit is a hack, e.g. it's the entire reason inject_emulated_exception() returns a bool. Even worse, it's confusing and misleading due to being incomplete. The need hack for the hack is not unique to !tdp_enabled, the #DF can be triggered any time L0 is intercepting #PF. Hello, allow_smaller_maxphyaddr. And while I think allow_smaller_maxphyaddr should be burned with fire, architecturally it's still incomplete. Any exception that is injected by KVM needs to be subjected to nested interception checks, not just #PF. E.g. a #GP while vectoring a different fault should also be routed to L1. KVM (mostly) gets away with special casing #PF because that's the only common scenario where L1 wants to intercept _and fix_ a fault that can occur while vectoring an exception. E.g. in the #GP => #DF case, odds are very good that L1 will inject a #DF too, but that doesn't make KVM's behavior correct. I have a series to handle this by performing the interception checks when an exception is queued, instead of when KVM injects the excepiton, and using a second kvm_queued_exception field to track exceptions that are queued for VM-Exit (so as not to lose the injected exception, which needs to be saved into vmc*12. It's functional, though I haven't tested migration (requires minor shenanigans to perform interception checks for pending exceptions coming in from userspace).