Re: [PATCH] i386/kvm: Set return value after handling KVM_EXIT_HYPERCALL

Binbin Wu <binbin.wu@xxxxxxxxxxxxxxx> · Fri, 13 Dec 2024 09:52:34 +0800

On 12/13/2024 9:46 AM, Binbin Wu wrote:



On 12/13/2024 5:28 AM, Paolo Bonzini wrote:
On 12/12/24 20:13, Sean Christopherson wrote:
On Thu, Dec 12, 2024, Paolo Bonzini wrote:
On 12/12/24 09:07, Zhao Liu wrote:
On Thu, Dec 12, 2024 at 11:26:28AM +0800, Binbin Wu wrote:
Date: Thu, 12 Dec 2024 11:26:28 +0800
From: Binbin Wu <binbin.wu@xxxxxxxxxxxxxxx>
Subject: [PATCH] i386/kvm: Set return value after handling
   KVM_EXIT_HYPERCALL
X-Mailer: git-send-email 2.46.0

Userspace should set the ret field of hypercall after handling
KVM_EXIT_HYPERCALL.  Otherwise, a stale value could be returned to KVM.

Fixes: 47e76d03b15 ("i386/kvm: Add KVM_EXIT_HYPERCALL handling for KVM_HC_MAP_GPA_RANGE")
Reported-by: Farrah Chen <farrah.chen@xxxxxxxxx>
Signed-off-by: Binbin Wu <binbin.wu@xxxxxxxxxxxxxxx>
Tested-by: Farrah Chen <farrah.chen@xxxxxxxxx>
---
To test the TDX code in kvm-coco-queue, please apply the patch to the QEMU,
otherwise, TDX guest boot could fail.
A matching QEMU tree including this patch is here:
https://github.com/intel-staging/qemu-tdx/releases/tag/tdx-qemu-upstream-v6.1-fix_kvm_hypercall_return_value

Previously, the issue was not triggered because no one would modify the ret
value. But with the refactor patch for __kvm_emulate_hypercall() in KVM,
https://lore.kernel.org/kvm/20241128004344.4072099-7-seanjc@xxxxxxxxxx/, the
value could be modified.

Could you explain the specific reasons here in detail? It would be
helpful with debugging or reproducing the issue.

---
   target/i386/kvm/kvm.c | 8 ++++++--
   1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 8e17942c3b..4bcccb48d1 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -6005,10 +6005,14 @@ static int kvm_handle_hc_map_gpa_range(struct kvm_run *run)
   static int kvm_handle_hypercall(struct kvm_run *run)
   {
+    int ret = -EINVAL;
+
       if (run->hypercall.nr == KVM_HC_MAP_GPA_RANGE)
-        return kvm_handle_hc_map_gpa_range(run);
+        ret = kvm_handle_hc_map_gpa_range(run);
+
+    run->hypercall.ret = ret;

ret may be negative but hypercall.ret is u64. Do we need to set it to
-ret?

If ret is less than zero, will stop the VM anyway as
RUN_STATE_INTERNAL_ERROR.

If this has to be fixed in QEMU, I think there's no need to set anything
if ret != 0; also because kvm_convert_memory() returns -1 on error and
that's not how the error would be passed to the guest.

However, I think the right fix should simply be this in KVM:

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 83fe0a78146f..e2118ba93ef6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10066,6 +10066,7 @@ unsigned long __kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr,
          }
          vcpu->run->exit_reason        = KVM_EXIT_HYPERCALL;
+        vcpu->run->ret                = 0;

        vcpu->run->hypercall.ret

vcpu->run->hypercall.nr       = KVM_HC_MAP_GPA_RANGE;
          vcpu->run->hypercall.args[0]  = gpa;
          vcpu->run->hypercall.args[1]  = npages;

While there is arguably a change in behavior of the kernel both with
the patches in kvm-coco-queue and with the above one, _in practice_
the above change is one that userspace will not notice.

I agree that KVM should initialize "ret", but I don't think '0' is the right
value.  KVM shouldn't assume userspace will successfully handle the hypercall.
What happens if KVM sets vcpu->run->hypercall.ret to a non-zero value, e.g. -KVM_ENOSYS?

Unfortunately QEMU is never writing vcpu->run->hypercall.ret, so the guest sees -KVM_ENOSYS; this is basically the same bug that Binbin is fixing, just with a different value passed to the guest.

In other words, the above one-liner is pulling the "don't break userspace" card.

Paolo


If the change need to be done in KVM, there are other 3 functions that use
KVM_EXIT_HYPERCALL based on the code in kvm-coco-queue.

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 40fe7258843e..a624f7289282 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3633,6 +3633,7 @@ static int snp_begin_psc_msr(struct vcpu_svm *svm, u64 ghcb_msr)
        }

        vcpu->run->exit_reason = KVM_EXIT_HYPERCALL;
+       vcpu->run->ret         = 0;
        vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE;
        vcpu->run->hypercall.args[0] = gpa;
        vcpu->run->hypercall.args[1] = 1;
@@ -3796,6 +3797,7 @@ static int snp_begin_psc(struct vcpu_svm *svm, struct psc_buffer *psc)
        case VMGEXIT_PSC_OP_PRIVATE:
        case VMGEXIT_PSC_OP_SHARED:
                vcpu->run->exit_reason = KVM_EXIT_HYPERCALL;
+               vcpu->run->ret         = 0;
                vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE;
                vcpu->run->hypercall.args[0] = gfn_to_gpa(gfn);
                vcpu->run->hypercall.args[1] = npages;
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 85c8aee263c1..c50c2edc8c56 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1161,6 +1161,7 @@ static void __tdx_map_gpa(struct vcpu_tdx * tdx)
        pr_err("%s: gpa = 0x%llx, size = 0x%llx", __func__, gpa, size);

        tdx->vcpu.run->exit_reason       = KVM_EXIT_HYPERCALL;
+       tdx->vcpu->run->ret              = 0;
        tdx->vcpu.run->hypercall.nr      = KVM_HC_MAP_GPA_RANGE;
        tdx->vcpu.run->hypercall.args[0] = gpa & ~gfn_to_gpa(kvm_gfn_direct_bits(tdx->vcpu.kvm));
        tdx->vcpu.run->hypercall.args[1] = size / PAGE_SIZE;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4f94b1e24eae..3f82bb2357e3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10070,6 +10070,7 @@ int ____kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr,
                }

                vcpu->run->exit_reason        = KVM_EXIT_HYPERCALL;
+               vcpu->run->ret                = 0;
                vcpu->run->hypercall.nr       = KVM_HC_MAP_GPA_RANGE;
                vcpu->run->hypercall.args[0]  = gpa;
                vcpu->run->hypercall.args[1]  = npages;

Maybe we could add a helper to fill the vcpu->run?