Re: [PATCH] KVM: nVMX: Fix setting of CR0 and CR4 in guest mode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2013-03-04 19:39, Gleb Natapov wrote:
> On Mon, Mar 04, 2013 at 07:08:08PM +0100, Jan Kiszka wrote:
>> On 2013-03-04 18:56, Gleb Natapov wrote:
>>> On Mon, Mar 04, 2013 at 03:25:47PM +0100, Jan Kiszka wrote:
>>>> On 2013-03-04 15:15, Gleb Natapov wrote:
>>>>> On Mon, Mar 04, 2013 at 03:09:51PM +0100, Jan Kiszka wrote:
>>>>>> On 2013-03-04 14:22, Gleb Natapov wrote:
>>>>>>> On Thu, Feb 28, 2013 at 10:44:47AM +0100, Jan Kiszka wrote:
>>>>>>>> The logic for calculating the value with which we call kvm_set_cr0/4 was
>>>>>>>> broken (will definitely be visible with nested unrestricted guest mode
>>>>>>>> support). Also, we performed the check regarding CR0_ALWAYSON too early
>>>>>>>> when in guest mode.
>>>>>>>>
>>>>>>>> What really needs to be done on both CR0 and CR4 is to mask out L1-owned
>>>>>>>> bits and merge them in from GUEST_CR0/4. In contrast, arch.cr0/4 and
>>>>>>>> arch.cr0/4_guest_owned_bits contain the mangled L0+L1 state and, thus,
>>>>>>>> are not suited as input.
>>>>>>>>
>>>>>>>> For both CRs, we can then apply the check against VMXON_CRx_ALWAYSON and
>>>>>>>> refuse the update if it fails. To be fully consistent, we implement this
>>>>>>>> check now also for CR4.
>>>>>>>>
>>>>>>>> Finally, we have to set the shadow to the value L2 wanted to write
>>>>>>>> originally.
>>>>>>>>
>>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@xxxxxxxxxxx>
>>>>>>>> ---
>>>>>>>>
>>>>>>>> Found while making unrestricted guest mode working. Not sure what impact
>>>>>>>> the bugs had on current feature level, if any.
>>>>>>>>
>>>>>>>> For interested folks, I've pushed my nEPT environment here:
>>>>>>>>
>>>>>>>>     git://git.kiszka.org/linux-kvm.git nept-hacking
>>>>>>>>
>>>>>>>>  arch/x86/kvm/vmx.c |   49 ++++++++++++++++++++++++++++++-------------------
>>>>>>>>  1 files changed, 30 insertions(+), 19 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>>>>>>> index 7cc566b..d1dac08 100644
>>>>>>>> --- a/arch/x86/kvm/vmx.c
>>>>>>>> +++ b/arch/x86/kvm/vmx.c
>>>>>>>> @@ -4605,37 +4605,48 @@ vmx_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall)
>>>>>>>>  /* called to set cr0 as appropriate for a mov-to-cr0 exit. */
>>>>>>>>  static int handle_set_cr0(struct kvm_vcpu *vcpu, unsigned long val)
>>>>>>>>  {
>>>>>>>> -	if (to_vmx(vcpu)->nested.vmxon &&
>>>>>>>> -	    ((val & VMXON_CR0_ALWAYSON) != VMXON_CR0_ALWAYSON))
>>>>>>>> -		return 1;
>>>>>>>> -
>>>>>>>>  	if (is_guest_mode(vcpu)) {
>>>>>>>> -		/*
>>>>>>>> -		 * We get here when L2 changed cr0 in a way that did not change
>>>>>>>> -		 * any of L1's shadowed bits (see nested_vmx_exit_handled_cr),
>>>>>>>> -		 * but did change L0 shadowed bits. This can currently happen
>>>>>>>> -		 * with the TS bit: L0 may want to leave TS on (for lazy fpu
>>>>>>>> -		 * loading) while pretending to allow the guest to change it.
>>>>>>>> -		 */
>>>>>>> Can't say I understand this patch yet, but it looks like the comment is
>>>>>>> still valid. Why have you removed it?
>>>>>>
>>>>>> L0 allows L1 or L2 at most to own TS, the rest is host-owned. I think
>>>>>> the comment was always misleading.
>>>>>>
>>>>> I do not see how it is misleading. For everything but TS we will not get
>>>>> here (if L1 is kvm). For TS we will get here if L1 allows L2 to change
>>>>> it, but L0 does not.
>>>>
>>>> For everything *but guest-owned* we will get here, thus for most CR0
>>>> accesses (bit-wise, not regarding frequency).
>>>>
>>> I do not see how. If bit is trapped by L1 we will not get here. We will
>>> do vmexit to L1 instead. nested_vmx_exit_handled_cr() check this condition.
>>> I am not arguing about you code (didn't grok it yet), but the comment
>>> still make sense to me.
>>
>> "We get here when L2 changed cr0 in a way that did not change any of
>> L1's shadowed bits (see nested_vmx_exit_handled_cr), but did change L0
>> shadowed bits." That I can sign. But the rest about TS is just
>> misleading as we trap _every_ change in L0 - except for TS under certain
>> conditions. The old code was tested against TS only, that's what the
>> comment witness.
>>
> TS is just an example of how we can get here with KVM on KVM. Obviously
> other hypervisors may have different configuration. L2 may allow full
> guest access to CR0 and then each CR0 write by L2 will be handled here.
> Under what other condition "we trap _every_ change in L0 - except for
> TS" here?

On FPU activation:

    cr0_guest_owned_bits = X86_CR0_TS;

And on FPU deactivation:

    cr0_guest_owned_bits = 0;

> 
>> If you prefer, I'll leave part one in.
>>
> Please do so. Without the comment it is not obvious why exit condition
> is not checked here. Still do not see why you object to TS part.

It describes a corner case in a way that suggests this is the only
reason why we get here.

Jan


Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux