Re: [PATCH v2] KVM: nVMX: Fix setting of CR0 and CR4 in guest mode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2013-03-07 09:57, Gleb Natapov wrote:
> On Thu, Mar 07, 2013 at 09:53:49AM +0100, Jan Kiszka wrote:
>> On 2013-03-07 09:43, Gleb Natapov wrote:
>>> On Thu, Mar 07, 2013 at 09:12:19AM +0100, Jan Kiszka wrote:
>>>> On 2013-03-07 08:51, Gleb Natapov wrote:
>>>>> On Mon, Mar 04, 2013 at 08:40:29PM +0100, Jan Kiszka wrote:
>>>>>> The logic for calculating the value with which we call kvm_set_cr0/4 was
>>>>>> broken (will definitely be visible with nested unrestricted guest mode
>>>>>> support). Also, we performed the check regarding CR0_ALWAYSON too early
>>>>>> when in guest mode.
>>>>>>
>>>>>> What really needs to be done on both CR0 and CR4 is to mask out L1-owned
>>>>>> bits and merge them in from GUEST_CR0/4. In contrast, arch.cr0/4 and
>>>>>> arch.cr0/4_guest_owned_bits contain the mangled L0+L1 state and, thus,
>>>>>> are not suited as input.
>>>>>>
>>>>>> For both CRs, we can then apply the check against VMXON_CRx_ALWAYSON and
>>>>>> refuse the update if it fails. To be fully consistent, we implement this
>>>>>> check now also for CR4.
>>>>>>
>>>>>> Finally, we have to set the shadow to the value L2 wanted to write
>>>>>> originally.
>>>>>>
>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@xxxxxxxxxxx>
>>>>>> ---
>>>>>>
>>>>>> Changes in v2:
>>>>>>  - keep the non-misleading part of the comment in handle_set_cr0
>>>>>>
>>>>>>  arch/x86/kvm/vmx.c |   46 +++++++++++++++++++++++++++++++---------------
>>>>>>  1 files changed, 31 insertions(+), 15 deletions(-)
>>>>>>
>>>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>>>>> index 7cc566b..832b7b4 100644
>>>>>> --- a/arch/x86/kvm/vmx.c
>>>>>> +++ b/arch/x86/kvm/vmx.c
>>>>>> @@ -4605,37 +4605,53 @@ vmx_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall)
>>>>>>  /* called to set cr0 as appropriate for a mov-to-cr0 exit. */
>>>>>>  static int handle_set_cr0(struct kvm_vcpu *vcpu, unsigned long val)
>>>>>>  {
>>>>>> -	if (to_vmx(vcpu)->nested.vmxon &&
>>>>>> -	    ((val & VMXON_CR0_ALWAYSON) != VMXON_CR0_ALWAYSON))
>>>>>> -		return 1;
>>>>>> -
>>>>>>  	if (is_guest_mode(vcpu)) {
>>>>>> +		struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
>>>>>> +		unsigned long orig_val = val;
>>>>>> +
>>>>>>  		/*
>>>>>>  		 * We get here when L2 changed cr0 in a way that did not change
>>>>>>  		 * any of L1's shadowed bits (see nested_vmx_exit_handled_cr),
>>>>>> -		 * but did change L0 shadowed bits. This can currently happen
>>>>>> -		 * with the TS bit: L0 may want to leave TS on (for lazy fpu
>>>>>> -		 * loading) while pretending to allow the guest to change it.
>>>>>> +		 * but did change L0 shadowed bits.
>>>>>>  		 */
>>>>>> -		if (kvm_set_cr0(vcpu, (val & vcpu->arch.cr0_guest_owned_bits) |
>>>>>> -			 (vcpu->arch.cr0 & ~vcpu->arch.cr0_guest_owned_bits)))
>>>>>> +		val = (val & ~vmcs12->cr0_guest_host_mask) |
>>>>>> +			(vmcs_read64(GUEST_CR0) & vmcs12->cr0_guest_host_mask);
>>>>> I think using GUEST_CR0 here is incorrect. It contains combination of bits
>>>>> set by L2, L1 and L0 and here we need to get only L2/L1 mix which is in
>>>>> vcpu->arch.cr0 (almost, but good enough for this case). Why vcpu->arch.cr0
>>>>> contains right L2/L1 mix?
>>>>
>>>> L0/L1. E.g., kvm_set_cr0 unconditionally injects X86_CR0_ET and masks
>>>> out reserved bits. But you are right, GUEST_CR0 isn't much better. And
>>>> maybe that mangling in kvm_set_cr0 is a corner case we can ignore.
>>>>
>>> I think we can. ET is R/O and wired to 1, so it does not matter what
>>> guest writes there it should be treated as 1. About reserved bits spec
>>> says that software should write what it reads there and does not specify
>>> what happens if software does not follow this.
>>>
>>>>> Because it was set to vmcs12->guest_cr0 during
>>>>> L2 #vmentry. While L2 is running three things may happen to CR0:
>>>>>
>>>>>  1. L2 writes to a bit that is not shadowed neither by L1 nor by L0. It
>>>>>     will go strait to GUEST_CR0.
>>>>>  2. L2 writes to a bit shadowed by L1. L1 #vmexit will be emulated. On the
>>>>>     next #vmetry vcpu->arch.cr0 will be set to whatever value L1 calculated.
>>>>>  3. L2 writes to a bit shadowed by L0, but not L1. This is the case we
>>>>>     are handling here. And if we will do it right vcpu->arch.cr0 will be
>>>>>     up-to-date at the end.
>>>>>
>>>>> The only case when, while this code running, vcpu->arch.cr0 has not
>>>>> up-to-date value is if 1 happened, but since L2 guest overwriting cr0
>>>>> here anyway it does not matter what it previously set in GUEST_CR0. The
>>>>> correct bits are in a new cr0 value provided by val and accessible by
>>>>> (val & ~vmcs12->cr0_guest_host_mask).
>>>>
>>>> I need to think about it again. Maybe vmcs12->guest_cr0 is best, but
>>>> that's a shot from the hips now.
>>>>
>>> I do not think it is correct because case 3 does not update it. So if 3
>>> happens twice without L1 #vmexit between then vmcs12->guest_cr0 will be
>>> outdated.
>>
>> Again, the only thing that matters here is L1's, not L0's view on the
>> "real" CR0 value. So guest_cr0 is never outdated (/wrt
>> cr0_guest_host_mask) as it will be updated by L1 in step 2. Even if
>> arch.cr0 vs. guest_cr0 makes no difference in practice, the latter is
>> more consistent, so I will go for it unless you can convince me it is wrong.
>>
> Hmm, yes you are right that wrt cr0_guest_host_mask guest_cr0 should be
> up-to-date. Please write a big comment about it.

Will do.

> And what about moving VMXON_CR0_ALWAYSON check into vmx_set_cr0()?

That doesn't make much sense for CR0 (due to the differences between
vmxon and guest mode - and lacking return code of set_cr4). But I can
consolidate the CR4 checks.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux