Re: [PATCH v2 5/8] KVM: nVMX: Fix guest CR3 read-back on VM-exit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2013-08-07 14:39, Gleb Natapov wrote:
> On Tue, Aug 06, 2013 at 05:57:02PM +0200, Jan Kiszka wrote:
>> On 2013-08-06 17:53, Gleb Natapov wrote:
>>> On Tue, Aug 06, 2013 at 05:48:54PM +0200, Jan Kiszka wrote:
>>>> On 2013-08-06 17:04, Zhang, Yang Z wrote:
>>>>> Gleb Natapov wrote on 2013-08-06:
>>>>>> On Tue, Aug 06, 2013 at 02:12:51PM +0000, Zhang, Yang Z wrote:
>>>>>>> Gleb Natapov wrote on 2013-08-06:
>>>>>>>> On Tue, Aug 06, 2013 at 11:44:41AM +0000, Zhang, Yang Z wrote:
>>>>>>>>> Gleb Natapov wrote on 2013-08-06:
>>>>>>>>>> On Tue, Aug 06, 2013 at 10:39:59AM +0200, Jan Kiszka wrote:
>>>>>>>>>>> From: Jan Kiszka <jan.kiszka@xxxxxxxxxxx>
>>>>>>>>>>>
>>>>>>>>>>> If nested EPT is enabled, the L2 guest may change CR3 without any
>>>>>>>>>>> exits. We therefore have to read the current value from the VMCS
>>>>>>>>>>> when switching to L1. However, if paging wasn't enabled, L0 tracks
>>>>>>>>>>> L2's CR3, and GUEST_CR3 rather contains the real-mode identity map.
>>>>>>>>>>> So we need to retrieve CR3 from the architectural state after
>>>>>>>>>>> conditionally updating it - and this is what kvm_read_cr3 does.
>>>>>>>>>>>
>>>>>>>>>> I have a headache from trying to think about it already, but
>>>>>>>>>> shouldn't
>>>>>>>>>> L1 be the one who setups identity map for L2? I traced what
>>>>>>>>>> vmcs_read64(GUEST_CR3)/kvm_read_cr3(vcpu) return here and do not
>>>>>>>>>> see
>>>>>>>>> Here is my understanding:
>>>>>>>>> In vmx_set_cr3(), if enabled ept, it will check whether target
>>>>>>>>> vcpu is enabling
>>>>>>>> paging. When L2 running in real mode, then target vcpu is not
>>>>>>>> enabling paging and it will use L0's identity map for L2. If you
>>>>>>>> read GUEST_CR3 from VMCS, then you may get the L2's identity map
>>>>>>>> not
>>>>>> L1's.
>>>>>>>>>
>>>>>>>> Yes, but why it makes sense to use L0 identity map for L2? I didn't
>>>>>>>> see different vmcs_read64(GUEST_CR3)/kvm_read_cr3(vcpu) values because
>>>>>>>> L0 and L1 use the same identity map address. When I changed identity
>>>>>>>> address L1 configures vmcs_read64(GUEST_CR3)/kvm_read_cr3(vcpu) are
>>>>>>>> indeed different, but the real CR3 L2 uses points to L0 identity map.
>>>>>>>> If I zero L1 identity map page L2 still works.
>>>>>>>>
>>>>>>> If L2 in real mode, then L2PA == L1PA. So L0's identity map also works
>>>>>>> if L2 is in real mode.
>>>>>>>
>>>>>> That not the point. It may work accidentally for kvm on kvm, but what
>>>>>> if other hypervisor plays different tricks and builds different ident map for its guest?
>>>>> Yes, if other hypervisor doesn't build the 1:1 mapping for its guest, it will fail to work. But I cannot imagine what kind of hypervisor will do this and what the purpose is.
>>>>> Anyway, current logic is definitely wrong. It should use L1's identity map instead L0's.
>>>>
>>>> So something like this is rather needed?
>>>>
>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>>> index 44494ed..60a3644 100644
>>>> --- a/arch/x86/kvm/vmx.c
>>>> +++ b/arch/x86/kvm/vmx.c
>>>> @@ -3375,8 +3375,10 @@ static void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>>>>  	if (enable_ept) {
>>>>  		eptp = construct_eptp(cr3);
>>>>  		vmcs_write64(EPT_POINTER, eptp);
>>>> -		guest_cr3 = is_paging(vcpu) ? kvm_read_cr3(vcpu) :
>>>> -			vcpu->kvm->arch.ept_identity_map_addr;
>>>> +		if (is_paging(vcpu) || is_guest_mode(vcpu))
>>>> +			guest_cr3 = kvm_read_cr3(vcpu) :
>>>> +		else
>>>> +			guest_cr3 = vcpu->kvm->arch.ept_identity_map_addr;
>>>>  		ept_load_pdptrs(vcpu);
>>>>  	}
>>>>  
>>> That what I am thinking, will think about it some more tomorrow.
>>
>> OK, and I'll feed it into a local test.
>>
> Thought about is some more. So without nested unrestricted guest (nUG)
> is_paging() will always be true (since without nUG guest entry is not
> possible otherwise) and guest's cr3 will be used, but with nUG identity
> map is not used (that is why L2 still works even though wrong identity
> map pointer is assigned to cr3), so the code here just corrupts nested
> guest's cr3 for no reason and that is why you had to use kvm_read_cr3()
> in prepare_vmcs12() to get correct cr3 value. The patch above should be
> used instead of original one IMO. How is testing going?

Yes, testing worked fine. I've queued above patch and will send it out
within the next round.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux