Re: [PATCH] i386/kvm: Set return value after handling KVM_EXIT_HYPERCALL

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Dec 12, 2024, Paolo Bonzini wrote:
> On 12/12/24 20:13, Sean Christopherson wrote:
> > On Thu, Dec 12, 2024, Paolo Bonzini wrote:
> > > If ret is less than zero, will stop the VM anyway as
> > > RUN_STATE_INTERNAL_ERROR.
> > > 
> > > If this has to be fixed in QEMU, I think there's no need to set anything
> > > if ret != 0; also because kvm_convert_memory() returns -1 on error and
> > > that's not how the error would be passed to the guest.
> > > 
> > > However, I think the right fix should simply be this in KVM:
> > > 
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index 83fe0a78146f..e2118ba93ef6 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -10066,6 +10066,7 @@ unsigned long __kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr,
> > >   		}
> > >   		vcpu->run->exit_reason        = KVM_EXIT_HYPERCALL;
> > > +		vcpu->run->ret                = 0;
> > 
> > 		vcpu->run->hypercall.ret
> > 
> > >   		vcpu->run->hypercall.nr       = KVM_HC_MAP_GPA_RANGE;
> > >   		vcpu->run->hypercall.args[0]  = gpa;
> > >   		vcpu->run->hypercall.args[1]  = npages;
> > > 
> > > While there is arguably a change in behavior of the kernel both with
> > > the patches in kvm-coco-queue and with the above one, _in practice_
> > > the above change is one that userspace will not notice.
> > 
> > I agree that KVM should initialize "ret", but I don't think '0' is the right
> > value.  KVM shouldn't assume userspace will successfully handle the hypercall.
> > What happens if KVM sets vcpu->run->hypercall.ret to a non-zero value, e.g. -KVM_ENOSYS?
> 
> Unfortunately QEMU is never writing vcpu->run->hypercall.ret, so the guest
> sees -KVM_ENOSYS; this is basically the same bug that Binbin is fixing, just
> with a different value passed to the guest.
> 
> In other words, the above one-liner is pulling the "don't break userspace"
> card.

But how is anything breaking userspace?  QEMU needs to opt-in to intercepting
KVM_HC_MAP_GPA_RANGE, and this has been KVM's behavior since commit 0dbb11230437
("KVM: X86: Introduce KVM_HC_MAP_GPA_RANGE hypercall").

Ah, "ret" happens to be deep in the union and KVM zero allocates vcpu->run, so
QEMU gets lucky and "ret" happens to be zero because no other non-fatal userspace
exit on x86 happens to need as many bytes.  Hilarious.

FWIW, if TDX marshalls hypercall state into KVM's "normal" registers, then KVM's
shenanigans with vcpu->run->hypercall.ret might go away?  Though regardless of
what happens on that front, I think it makes to explicitly initialize "ret" to
*something*.

I checked our VMM, and it does the right thing, so I don't have any objection
to explicitly zeroing "ret".  Though it needs a comment explaining that it's a
terrible hack for broken userspace ;-)




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux