Re: [PATCH v7 2/5] KVM: x86: Virtualize CR3.LAM_{U48,U57}

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2023-04-21 at 14:35 +0800, Binbin Wu wrote:
> On 4/13/2023 5:13 PM, Huang, Kai wrote:
> > > On 4/13/2023 10:27 AM, Huang, Kai wrote:
> > > > On Thu, 2023-04-13 at 09:36 +0800, Binbin Wu wrote:
> > > > > On 4/12/2023 7:58 PM, Huang, Kai wrote:
> > > > > 
> > > ...
> > > > > > > +	root_gfn = (root_pgd & __PT_BASE_ADDR_MASK) >> PAGE_SHIFT;
> > > > > > Or, should we explicitly mask vcpu->arch.cr3_ctrl_bits?  In this
> > > > > > way, below
> > > > > > mmu_check_root() may potentially catch other invalid bits, but in
> > > > > > practice there should be no difference I guess.
> > > > > In previous version, vcpu->arch.cr3_ctrl_bits was used as the mask.
> > > > > 
> > > > > However, Sean pointed out that the return value of
> > > > > mmu->get_guest_pgd(vcpu) could be
> > > > > EPTP for nested case, so it is not rational to mask to CR3 bit(s) from EPTP.
> > > > Yes, although EPTP's high bits don't contain any control bits.
> > > > 
> > > > But perhaps we want to make it future-proof in case some more control
> > > > bits are added to EPTP too.
> > > > 
> > > > > Since the guest pgd has been check for valadity, for both CR3 and
> > > > > EPTP, it is safe to mask off non-address bits to get GFN.
> > > > > 
> > > > > Maybe I should add this CR3 VS. EPTP part to the changelog to make it
> > > > > more undertandable.
> > > > This isn't necessary, and can/should be done in comments if needed.
> > > > 
> > > > But IMHO you may want to add more material to explain how nested cases
> > > > are handled.
> > > Do you mean about CR3 or others?
> > > 
> > This patch is about CR3, so CR3.
> 
> For nested case, I plan to add the following in the changelog:
> 
>      For nested guest:
>      - If CR3 is intercepted, after CR3 handled in L1, CR3 will be 
> checked in
>        nested_vmx_load_cr3() before returning back to L2.
>      - For the shadow paging case (SPT02), LAM bits are also be handled 
> to form
>        a new shadow CR3 in vmx_load_mmu_pgd().
> 
> 

I don't know a lot of code detail of KVM nested code, but in general, since your
code only changes nested_vmx_load_cr3() and nested_vmx_check_host_state(), the
changelog should focus on explaining why modifying these two functions are good
enough.

And to explain this, I think we should explain from hardware's perspective
rather than from shadow paging's perspective.

>From L0's perspective, we care about:

	1) L1's CR3 register (VMCS01's GUEST_CR3)
	2) L1's VMCS to run L2 guest
		2.1) VMCS12's HOST_CR3 
		2.2) VMCS12's GUEST_CR3

For 1) the current changelog has explained (that we need to catch _active_
control bits in guest's CR3 etc).  For nested (case 2)), L1 can choose to
intercept CR3 or not.  But this isn't the point because from hardware's point of
view we actually care about below two cases instead:

	1) L0 to emulate a VMExit from L2 to L1 by VMENTER using VMCS01 
	   to reflect VMCS12
	2) L0 to VMENTER to L2 using VMCS02 directly

The case 2) can fail due to fail to VMENTER to L2 of course but this should
result in L0 to VMENTER to L1 with a emulated VMEXIT from L2 to L1 which is the
case 1).

For case 1) we need new code to check VMCS12's HOST_CR3 against guest's _active_
CR3 feature control bits.  The key code path is:

	vmx_handle_exit()
		-> prepare_vmcs12()
		-> load_vmcs12_host_state().  

For case 2) I _think_ we need new code to check both VMCS12's HOST_CR3 and
GUEST_CR3 against active control bits.  The key code path is 

	nested_vmx_run() -> 
		-> nested_vmx_check_host_state()
		-> nested_vmx_enter_non_root_mode()
			-> prepare_vmcs02_early()
			-> prepare_vmcs02()

Since nested_vmx_load_cr3() is used in both VMENTER using VMCS12's HOST_CR3
(VMEXIT to L1) and GUEST_CR3 (VMENTER to L2), and it currently already checks
kvm_vcpu_is_illegal_gpa(vcpu, cr3), changing it to additionally check guest CR3
active control bits seems just enough.

Also, nested_vmx_check_host_state() (i.e. it is called in nested_vmx_run() to
return early if any HOST state is wrong) currently checks
kvm_vcpu_is_illegal_gpa(vcpu, cr3) too so we should also change it.

That being said, I do find it's not easy to come up with a "concise" changelog
to cover both non-nested and (especially) nested cases, but it seems we can
abstract some from my above typing?  

My suggestion is to focus on the hardware behaviour's perspective as typed
above.  But I believe Sean can easily make a "concise" changelog if he wants to
comment here :)




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux