Re: [PATCH v4 2/2] KVM: nSVM: implement ondemand allocation of the nested state

Maxim Levitsky <mlevitsk@xxxxxxxxxx> · Mon, 21 Sep 2020 11:57:27 +0300

On Mon, 2020-09-21 at 10:53 +0300, Maxim Levitsky wrote:
> On Sun, 2020-09-20 at 18:42 +0200, Paolo Bonzini wrote:
> > On 20/09/20 18:16, Sean Christopherson wrote:
> > > > Maxim, your previous version was adding some error handling to
> > > > kvm_x86_ops.set_efer.  I don't remember what was the issue; did you have
> > > > any problems propagating all the errors up to KVM_SET_SREGS (easy),
> > > > kvm_set_msr (harder) etc.?
> > > I objected to letting .set_efer() return a fault.
> > 
> > So did I, and that's why we get KVM_REQ_OUT_OF_MEMORY.  But it was more
> > of an "it's ugly and it ought not to fail" thing than something I could
> > pinpoint.
> > 
> > It looks like we agree, but still we have to choose the lesser evil?
> > 
> > Paolo
> > 
> > > A relatively minor issue is
> > > the code in vmx_set_efer() that handles lack of EFER because technically KVM
> > > can emulate EFER.SCE+SYSCALL without supporting EFER in hardware.  Returning
> > > success/'0' would avoid that particular issue.  My primary concern is that I'd
> > > prefer not to add another case where KVM can potentially ignore a fault
> > > indicated by a helper, a la vmx_set_cr4().
> 
> The thing is that kvm_emulate_wrmsr injects #GP when kvm_set_msr returns any non zero value,
> and returns 1 which means keep on going if I understand correctly (0 is userspace exit,
> negative value would be a return to userspace with an error)
> 
> So the question is if we have other wrmsr handlers which return negative value, and would
> be affected by changing kvm_emulate_wrmsr to pass through the error value.
> I am checking the code now.
> 
> I do agree now that this is the *correct* solution to this problem.
> 
> Best regards,
> 	Maxim Levitsky


So those are results of my analysis:

WRMSR called functions that return negative value (I could have missed something,
but I double checked the wrmsr code in both SVM and VMX, and in the common x86 code):

vmx_set_vmx_msr - this is only called from userspace (msr_info->host_initiated == true),
so this can be left as is

xen_hvm_config - this code should probably return 1 in some cases, but in one case,
it legit does memory allocation like I do, and failure should probably kill the guest
as well (but I can keep it as is if we are afraid that new behavier will not be
backward compatible)

What do you think about this (only compile tested since I don't have any xen setups):

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 36e963dc1da61..66a57c5b14dfd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2695,24 +2695,19 @@ static int xen_hvm_config(struct kvm_vcpu *vcpu, u64 data)
        u32 page_num = data & ~PAGE_MASK;
        u64 page_addr = data & PAGE_MASK;
        u8 *page;
-       int r;
 
-       r = -E2BIG;
        if (page_num >= blob_size)
-               goto out;
-       r = -ENOMEM;
+               return 1;
+
        page = memdup_user(blob_addr + (page_num * PAGE_SIZE), PAGE_SIZE);
-       if (IS_ERR(page)) {
-               r = PTR_ERR(page);
-               goto out;
+       if (IS_ERR(page))
+               return PTR_ERR(page);
+
+       if (kvm_vcpu_write_guest(vcpu, page_addr, page, PAGE_SIZE)) {
+               kfree(page);
+               return 1;
        }
-       if (kvm_vcpu_write_guest(vcpu, page_addr, page, PAGE_SIZE))
-               goto out_free;
-       r = 0;
-out_free:
-       kfree(page);
-out:
-       return r;
+       return 0;
 }


The msr write itself can be reached from the guest through two functions,
from kvm_emulate_wrmsr which is called in wrmsr interception from both VMX and SVM,
and from em_wrmsr which is called in unlikely case the emulator needs to emulate a wrmsr.

Both should be changed to inject #GP only on positive return value and pass the error
otherwise.

Sounds reasonable? If you agree I'll post the patches implementing this.

Best regards,
	Maxim Levitsky