On Thu, Dec 12, 2024, Paolo Bonzini wrote: > On 12/12/24 20:13, Sean Christopherson wrote: > > On Thu, Dec 12, 2024, Paolo Bonzini wrote: > > > If ret is less than zero, will stop the VM anyway as > > > RUN_STATE_INTERNAL_ERROR. > > > > > > If this has to be fixed in QEMU, I think there's no need to set anything > > > if ret != 0; also because kvm_convert_memory() returns -1 on error and > > > that's not how the error would be passed to the guest. > > > > > > However, I think the right fix should simply be this in KVM: > > > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > > index 83fe0a78146f..e2118ba93ef6 100644 > > > --- a/arch/x86/kvm/x86.c > > > +++ b/arch/x86/kvm/x86.c > > > @@ -10066,6 +10066,7 @@ unsigned long __kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr, > > > } > > > vcpu->run->exit_reason = KVM_EXIT_HYPERCALL; > > > + vcpu->run->ret = 0; > > > > vcpu->run->hypercall.ret > > > > > vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE; > > > vcpu->run->hypercall.args[0] = gpa; > > > vcpu->run->hypercall.args[1] = npages; > > > > > > While there is arguably a change in behavior of the kernel both with > > > the patches in kvm-coco-queue and with the above one, _in practice_ > > > the above change is one that userspace will not notice. > > > > I agree that KVM should initialize "ret", but I don't think '0' is the right > > value. KVM shouldn't assume userspace will successfully handle the hypercall. > > What happens if KVM sets vcpu->run->hypercall.ret to a non-zero value, e.g. -KVM_ENOSYS? > > Unfortunately QEMU is never writing vcpu->run->hypercall.ret, so the guest > sees -KVM_ENOSYS; this is basically the same bug that Binbin is fixing, just > with a different value passed to the guest. > > In other words, the above one-liner is pulling the "don't break userspace" > card. But how is anything breaking userspace? QEMU needs to opt-in to intercepting KVM_HC_MAP_GPA_RANGE, and this has been KVM's behavior since commit 0dbb11230437 ("KVM: X86: Introduce KVM_HC_MAP_GPA_RANGE hypercall"). Ah, "ret" happens to be deep in the union and KVM zero allocates vcpu->run, so QEMU gets lucky and "ret" happens to be zero because no other non-fatal userspace exit on x86 happens to need as many bytes. Hilarious. FWIW, if TDX marshalls hypercall state into KVM's "normal" registers, then KVM's shenanigans with vcpu->run->hypercall.ret might go away? Though regardless of what happens on that front, I think it makes to explicitly initialize "ret" to *something*. I checked our VMM, and it does the right thing, so I don't have any objection to explicitly zeroing "ret". Though it needs a comment explaining that it's a terrible hack for broken userspace ;-)