Re: [PATCH v7 2/5] KVM: x86: Virtualize CR3.LAM_{U48,U57}

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 21, 2023 at 07:43:52PM +0800, Huang, Kai wrote:
>On Fri, 2023-04-21 at 14:35 +0800, Binbin Wu wrote:
>> On 4/13/2023 5:13 PM, Huang, Kai wrote:
>> > > On 4/13/2023 10:27 AM, Huang, Kai wrote:
>> > > > On Thu, 2023-04-13 at 09:36 +0800, Binbin Wu wrote:
>> > > > > On 4/12/2023 7:58 PM, Huang, Kai wrote:
>> > > > > 
>> > > ...
>> > > > > > > +	root_gfn = (root_pgd & __PT_BASE_ADDR_MASK) >> PAGE_SHIFT;
>> > > > > > Or, should we explicitly mask vcpu->arch.cr3_ctrl_bits?  In this
>> > > > > > way, below
>> > > > > > mmu_check_root() may potentially catch other invalid bits, but in
>> > > > > > practice there should be no difference I guess.
>> > > > > In previous version, vcpu->arch.cr3_ctrl_bits was used as the mask.
>> > > > > 
>> > > > > However, Sean pointed out that the return value of
>> > > > > mmu->get_guest_pgd(vcpu) could be
>> > > > > EPTP for nested case, so it is not rational to mask to CR3 bit(s) from EPTP.
>> > > > Yes, although EPTP's high bits don't contain any control bits.
>> > > > 
>> > > > But perhaps we want to make it future-proof in case some more control
>> > > > bits are added to EPTP too.
>> > > > 
>> > > > > Since the guest pgd has been check for valadity, for both CR3 and
>> > > > > EPTP, it is safe to mask off non-address bits to get GFN.
>> > > > > 
>> > > > > Maybe I should add this CR3 VS. EPTP part to the changelog to make it
>> > > > > more undertandable.
>> > > > This isn't necessary, and can/should be done in comments if needed.
>> > > > 
>> > > > But IMHO you may want to add more material to explain how nested cases
>> > > > are handled.
>> > > Do you mean about CR3 or others?
>> > > 
>> > This patch is about CR3, so CR3.
>> 
>> For nested case, I plan to add the following in the changelog:
>> 
>>      For nested guest:
>>      - If CR3 is intercepted, after CR3 handled in L1, CR3 will be 
>> checked in
>>        nested_vmx_load_cr3() before returning back to L2.
>>      - For the shadow paging case (SPT02), LAM bits are also be handled 
>> to form
>>        a new shadow CR3 in vmx_load_mmu_pgd().
>> 
>> 
>
>I don't know a lot of code detail of KVM nested code, but in general, since your
>code only changes nested_vmx_load_cr3() and nested_vmx_check_host_state(), the
>changelog should focus on explaining why modifying these two functions are good
>enough.

the patch relaxes all existing checks on CR3, allowing previously reserved bits
to be set. It should be enough otherwise some checks on CR3 are missing in the
first place.

>
>And to explain this, I think we should explain from hardware's perspective
>rather than from shadow paging's perspective.
>
>From L0's perspective, we care about:
>
>	1) L1's CR3 register (VMCS01's GUEST_CR3)
>	2) L1's VMCS to run L2 guest
>		2.1) VMCS12's HOST_CR3 
>		2.2) VMCS12's GUEST_CR3
>
>For 1) the current changelog has explained (that we need to catch _active_
>control bits in guest's CR3 etc).  For nested (case 2)), L1 can choose to
>intercept CR3 or not.  But this isn't the point because from hardware's point of
>view we actually care about below two cases instead:
>
>	1) L0 to emulate a VMExit from L2 to L1 by VMENTER using VMCS01 
>	   to reflect VMCS12
>	2) L0 to VMENTER to L2 using VMCS02 directly
>
>The case 2) can fail due to fail to VMENTER to L2 of course but this should
>result in L0 to VMENTER to L1 with a emulated VMEXIT from L2 to L1 which is the
>case 1).
>
>For case 1) we need new code to check VMCS12's HOST_CR3 against guest's _active_
>CR3 feature control bits.  The key code path is:
>
>	vmx_handle_exit()
>		-> prepare_vmcs12()
>		-> load_vmcs12_host_state().  
>
>For case 2) I _think_ we need new code to check both VMCS12's HOST_CR3 and
>GUEST_CR3 against active control bits.  The key code path is 

...

>
>	nested_vmx_run() -> 
>		-> nested_vmx_check_host_state()
>		-> nested_vmx_enter_non_root_mode()
>			-> prepare_vmcs02_early()
>			-> prepare_vmcs02()
>
>Since nested_vmx_load_cr3() is used in both VMENTER using VMCS12's HOST_CR3
>(VMEXIT to L1) and GUEST_CR3 (VMENTER to L2), and it currently already checks
>kvm_vcpu_is_illegal_gpa(vcpu, cr3), changing it to additionally check guest CR3
>active control bits seems just enough.

IMO, current KVM relies on hardware to do consistency check on the GUEST_CR3
during VM entry.

>
>Also, nested_vmx_check_host_state() (i.e. it is called in nested_vmx_run() to
>return early if any HOST state is wrong) currently checks
>kvm_vcpu_is_illegal_gpa(vcpu, cr3) too so we should also change it.
>
>That being said, I do find it's not easy to come up with a "concise" changelog
>to cover both non-nested and (especially) nested cases, but it seems we can
>abstract some from my above typing?  
>
>My suggestion is to focus on the hardware behaviour's perspective as typed
>above.  But I believe Sean can easily make a "concise" changelog if he wants to
>comment here :)



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux