Re: RFC: few questions about hypercall patching in KVM

Sean Christopherson <seanjc@xxxxxxxxxx> · Mon, 19 Dec 2022 18:27:54 +0000

On Thu, Dec 15, 2022, Maxim Levitsky wrote:
> On Thu, 2022-12-15 at 00:53 +0000, Sean Christopherson wrote:
> > On Wed, Dec 14, 2022, Maxim Levitsky wrote:
> > > 1. Now I suggest that when hypercall patching fails, can we do
> > > kvm_vm_bugged() instead of forwarding the hypercall?  I know that vmmcall can
> > > be executed from ring 3 as well, so I can limit this to hypercall patching
> > > that happens when guest ring is 0.
> > 
> > And L1.  But why?  It's not a KVM bug per se, it's a known deficiency in KVM's
> > emulator.  What to do in response to the failure should be up to userspace.  The
> > real "fix" is to disable the quirk in QEMU.
> 
> Yes, and L1, you are right - I thought about nested case, that maybe it is possible
> to eliminate it, but you are right, it can't be eliminated.
> 
> My reasoning for doing kvm_vm_bugged() (or returning X86EMUL_UNHANDLEABLE
> even better maybe, to give userspace a theoretical chance of dealing with it) 
> 
> is to make the error at least a bit more visible.  (I for example thought for
> a while that there is some memory corrupion in the guest caused by valgrind,
> which cause that #PF)

Yeah, the #PF is nasty, but bugging the VM isn't much better, and based on past
analysis, gracefully getting out to userspace isn't trivial.

https://lore.kernel.org/all/YUNqEeWg32kNwfO8@xxxxxxxxxx

> > > 2. Why can't we just emulate the VMCALL/VMMCALL instruction in this case
> > > instead of patching? Any technical reasons for not doing this?  Few guests
> > > use it so the perf impact should be very small.
> > 
> > Nested is basically impossible to get right[1][2].  IIRC, calling into
> > kvm_emulate_hypercall() from the emulator also gets messy (I think I tried doing
> > exactly this at some point).
> 
> It could very well be, however if L0's KVM starts to emulate both VMMCALL and
> VMCALL instructions (when the quirk is enabled) then it will be the closest
> to what KVM always did, and it will not overwrite the guest memory.
> 
> About calling into kvm_emulate_hypercall I can expect trouble, but I would be
> very happy if you recall which problems did you face.

The above link has more details than I can recall.

> Note that at least for a nested guest, we can avoid patching right away
> because both VMMCALL and VMCALL that are done in nested guest will never need
> to call kvm_emulate_hypercall().
> 
> VMCALL is always intercepted by L1 as defined by VMX spec, while VMMCALL if
> not intercepted causes #UD in the guest.
> 
> In those cases emulation is very simple.
> 
> As for L1, we already have a precedent: #GP is sometimes emulated as SVM
> instruction due to the AMD's errata.
> 
> 
> Look at gp_interception:
> 
> You first decode the instruciton, and if it is VMCALL, then call the
> kvm_emulate_hypercall() This way there is no recursive emulator call.
> 
> What do you think?

I don't love the idea of expanding out-of-emulator emulation, especially since
the behavior is quirky, i.e. KVM shouldn't emulate the wrong hypercall instruction
if userspace has disabled KVM_X86_QUIRK_FIX_HYPERCALL_INSN.

My vote is to have QEMU disable the quirk, and if necessary, "fix" QEMU's TCG to
enumerate the correct vendor.