On Thu, Dec 15, 2022, Maxim Levitsky wrote: > On Thu, 2022-12-15 at 00:53 +0000, Sean Christopherson wrote: > > On Wed, Dec 14, 2022, Maxim Levitsky wrote: > > > 1. Now I suggest that when hypercall patching fails, can we do > > > kvm_vm_bugged() instead of forwarding the hypercall? I know that vmmcall can > > > be executed from ring 3 as well, so I can limit this to hypercall patching > > > that happens when guest ring is 0. > > > > And L1. But why? It's not a KVM bug per se, it's a known deficiency in KVM's > > emulator. What to do in response to the failure should be up to userspace. The > > real "fix" is to disable the quirk in QEMU. > > Yes, and L1, you are right - I thought about nested case, that maybe it is possible > to eliminate it, but you are right, it can't be eliminated. > > My reasoning for doing kvm_vm_bugged() (or returning X86EMUL_UNHANDLEABLE > even better maybe, to give userspace a theoretical chance of dealing with it) > > is to make the error at least a bit more visible. (I for example thought for > a while that there is some memory corrupion in the guest caused by valgrind, > which cause that #PF) Yeah, the #PF is nasty, but bugging the VM isn't much better, and based on past analysis, gracefully getting out to userspace isn't trivial. https://lore.kernel.org/all/YUNqEeWg32kNwfO8@xxxxxxxxxx > > > 2. Why can't we just emulate the VMCALL/VMMCALL instruction in this case > > > instead of patching? Any technical reasons for not doing this? Few guests > > > use it so the perf impact should be very small. > > > > Nested is basically impossible to get right[1][2]. IIRC, calling into > > kvm_emulate_hypercall() from the emulator also gets messy (I think I tried doing > > exactly this at some point). > > It could very well be, however if L0's KVM starts to emulate both VMMCALL and > VMCALL instructions (when the quirk is enabled) then it will be the closest > to what KVM always did, and it will not overwrite the guest memory. > > About calling into kvm_emulate_hypercall I can expect trouble, but I would be > very happy if you recall which problems did you face. The above link has more details than I can recall. > Note that at least for a nested guest, we can avoid patching right away > because both VMMCALL and VMCALL that are done in nested guest will never need > to call kvm_emulate_hypercall(). > > VMCALL is always intercepted by L1 as defined by VMX spec, while VMMCALL if > not intercepted causes #UD in the guest. > > In those cases emulation is very simple. > > As for L1, we already have a precedent: #GP is sometimes emulated as SVM > instruction due to the AMD's errata. > > > Look at gp_interception: > > You first decode the instruciton, and if it is VMCALL, then call the > kvm_emulate_hypercall() This way there is no recursive emulator call. > > What do you think? I don't love the idea of expanding out-of-emulator emulation, especially since the behavior is quirky, i.e. KVM shouldn't emulate the wrong hypercall instruction if userspace has disabled KVM_X86_QUIRK_FIX_HYPERCALL_INSN. My vote is to have QEMU disable the quirk, and if necessary, "fix" QEMU's TCG to enumerate the correct vendor.