Re: [PATCH] KVM: SVM: Use a separate vmcb for the nested L2 guest

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2020-10-08 at 13:39 +0300, Maxim Levitsky wrote:
> On Thu, 2020-10-08 at 13:23 +0300, Maxim Levitsky wrote:
> > On Thu, 2020-10-08 at 07:52 +0200, Paolo Bonzini wrote:
> > > On 08/10/20 00:14, Maxim Levitsky wrote:
> > > > > +	if (svm->vmcb01->control.asid == 0)
> > > > > +		svm->vmcb01->control.asid = svm->nested.vmcb02->control.asid;
> > > > 
> > > > I think that the above should be done always. The asid field is currently host
> > > > controlled only (that is L2 value is ignored, selective ASID tlb flush is not
> > > > advertized to the guest and lnvlpga is emulated as invlpg). 
> > > 
> > > Yes, in fact I suggested that ASID should be in svm->asid and moved to
> > > svm->vmcb->asid in svm_vcpu_run.  Then there's no need to special case
> > > it in nested code.
> > This makes lot of sense!
> > > This should be a patch coming before this one.
> > > 
> > > > 1. Something wrong with memory types - like guest is using UC memory for everything.
> > > >     I can't completely rule that out yet
> > > 
> > > You can print g_pat and see if it is all zeroes.
> > I don't even need to print it. I know that it is never set anywhere, unless guest writes it,
> > but now that I look at it, we set it to a default value and there is no code to set it to
> > default value for vmcb02. This is it. now my fedora guest boots just fine!
> > 
> > I played a lot with g_pat, and yet this didn't occur to me . I was that close :-(
> > I knew that it has to be something with memory types, but it never occured to me
> > that guest just doesn't write IA32_PAT and uses our value which we set in init_vmcb
> > 
> > 
> > > In general I think it's better to be explicit with vmcb01 vs. vmcb02,
> > > like Cathy did, but I can see it's a matter of personal preference to
> > > some extent.
> > I also think so in general, but in the code that is outside 'is_guest_mode'
> > IMHO it is better to refer to vmcb01 as vmcb, although now that I think of
> > it, its not wrong to do so either.
> > 
> > 
> > My windows hyper-v guest doesn't boot though and I know why.
> > 
> > As we know the vmcb save area has extra state which vmrun/vmexit don't touch.
> > Now suppose a nested hypervisor wants to enter a nested guest.
> > 
> > It will do vmload, which will load the extra state from the nested vmcb (vmcb12
> > or as I woudl say the vmcb that nested hypervisor thinks that it is using),
> > to the CPU. This can cause some vmexits I think, but this doesn't matter much.
> > 
> > Now the nested hypervisor does vmrun. The extra state of L2 guest is in CPU registers,
> > and it is untouched. We do vmsave on vmcb01 to preserve that state, but later
> > when we do vmload on vmcb02 prior to vmenter on it, which loads stale state from it.
> > The same issue happens the other way around on nested vmexit.
> > 
> > I fixed this by doing nested_svm_vmloadsave, but that should be probably be 
> > optimized with dirty bits. Now though I guess the goal it to make
> > it work first.
> > 
> > With this fixed HyperV boots fine, and even passes the 'works' test of booting
> > the windows 10 with hyperv enabled nested itself and starting the vm inside,
> > which makes that VM L3 (in addition to windows itself that runs as L3 in relation to hyper-v)
> > 
> > https://i.imgur.com/sRYqtVV.png
> > 
> > In summary this is the diff of fixes (just pasted to email, probably mangled):
> > 
> > 
> > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> > index 0a06e62010d8c..7293ba23b3cbc 100644
> > --- a/arch/x86/kvm/svm/nested.c
> > +++ b/arch/x86/kvm/svm/nested.c
> > @@ -436,6 +436,9 @@ int enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
> >         WARN_ON(svm->vmcb == svm->nested.vmcb02);
> >  
> >         svm->nested.vmcb02->control = svm->vmcb01->control;
> > +
> > +       nested_svm_vmloadsave(svm->vmcb01, svm->nested.vmcb02);
> > +
> >         svm->vmcb = svm->nested.vmcb02;
> >         svm->vmcb_pa = svm->nested.vmcb02_pa;
> >         load_nested_vmcb_control(svm, &nested_vmcb->control);
> > @@ -622,6 +625,7 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
> >         if (svm->vmcb01->control.asid == 0)
> >                 svm->vmcb01->control.asid = svm->nested.vmcb02->control.asid;
> >  
> > +       nested_svm_vmloadsave(svm->nested.vmcb02, svm->vmcb01);
> >         svm->vmcb = svm->vmcb01;
> >         svm->vmcb_pa = svm->nested.vmcb01_pa;
> >  
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index b66239b26885d..ee9f87fe611f2 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -1097,6 +1097,7 @@ static void init_vmcb(struct vcpu_svm *svm)
> >                 clr_cr_intercept(svm, INTERCEPT_CR3_READ);
> >                 clr_cr_intercept(svm, INTERCEPT_CR3_WRITE);
> >                 save->g_pat = svm->vcpu.arch.pat;
> > +               svm->nested.vmcb02->save.g_pat = svm->vcpu.arch.pat;
> >                 save->cr3 = 0;
> >                 save->cr4 = 0;
> >         }
> > 
> > 
> > 
> > Best regards,
> > 	Maxim Levitsky
> > 
> > > Paolo
> > > 
> And another thing I spotted before I forget.
> 
> If we setup a tlb flush in ctl.tlb_ctl of vmcb01, just prior to nested vmentry
> then this field will be copied to vmcb02 but on next vmexit we clear it in current
> (that is vmcb02) and that change will not propogate to vmcb01.
> 
> I am not sure if this is a theorerical issue or not. We probably should apply the same treatment to
> it as what Paulo suggested to do with asid - 
> set it just prior to vmentry if tlb flush is needed, and clear it afterwards as we do.

And yet another thing to note is that we curently ignore L2's g_pat. However it _seems_ that we practically
ignore L1 PAT as well in regard to shadowing NPT mmu. I am not 100% sure about this.
I'll dig that area eventually.

Best regards,
	Maxim Levitsky

> 
> Best regards,
> 	Maxim Levitsky





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux