Re: [PATCH] KVM: nVMX: Fix setting of CR0 and CR4 in guest mode

Gleb Natapov <gleb@xxxxxxxxxx> · Mon, 4 Mar 2013 20:39:56 +0200



On Mon, Mar 04, 2013 at 07:08:08PM +0100, Jan Kiszka wrote:
> On 2013-03-04 18:56, Gleb Natapov wrote:
> > On Mon, Mar 04, 2013 at 03:25:47PM +0100, Jan Kiszka wrote:
> >> On 2013-03-04 15:15, Gleb Natapov wrote:
> >>> On Mon, Mar 04, 2013 at 03:09:51PM +0100, Jan Kiszka wrote:
> >>>> On 2013-03-04 14:22, Gleb Natapov wrote:
> >>>>> On Thu, Feb 28, 2013 at 10:44:47AM +0100, Jan Kiszka wrote:
> >>>>>> The logic for calculating the value with which we call kvm_set_cr0/4 was
> >>>>>> broken (will definitely be visible with nested unrestricted guest mode
> >>>>>> support). Also, we performed the check regarding CR0_ALWAYSON too early
> >>>>>> when in guest mode.
> >>>>>>
> >>>>>> What really needs to be done on both CR0 and CR4 is to mask out L1-owned
> >>>>>> bits and merge them in from GUEST_CR0/4. In contrast, arch.cr0/4 and
> >>>>>> arch.cr0/4_guest_owned_bits contain the mangled L0+L1 state and, thus,
> >>>>>> are not suited as input.
> >>>>>>
> >>>>>> For both CRs, we can then apply the check against VMXON_CRx_ALWAYSON and
> >>>>>> refuse the update if it fails. To be fully consistent, we implement this
> >>>>>> check now also for CR4.
> >>>>>>
> >>>>>> Finally, we have to set the shadow to the value L2 wanted to write
> >>>>>> originally.
> >>>>>>
> >>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@xxxxxxxxxxx>
> >>>>>> ---
> >>>>>>
> >>>>>> Found while making unrestricted guest mode working. Not sure what impact
> >>>>>> the bugs had on current feature level, if any.
> >>>>>>
> >>>>>> For interested folks, I've pushed my nEPT environment here:
> >>>>>>
> >>>>>>     git://git.kiszka.org/linux-kvm.git nept-hacking
> >>>>>>
> >>>>>>  arch/x86/kvm/vmx.c |   49 ++++++++++++++++++++++++++++++-------------------
> >>>>>>  1 files changed, 30 insertions(+), 19 deletions(-)
> >>>>>>
> >>>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >>>>>> index 7cc566b..d1dac08 100644
> >>>>>> --- a/arch/x86/kvm/vmx.c
> >>>>>> +++ b/arch/x86/kvm/vmx.c
> >>>>>> @@ -4605,37 +4605,48 @@ vmx_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall)
> >>>>>>  /* called to set cr0 as appropriate for a mov-to-cr0 exit. */
> >>>>>>  static int handle_set_cr0(struct kvm_vcpu *vcpu, unsigned long val)
> >>>>>>  {
> >>>>>> -	if (to_vmx(vcpu)->nested.vmxon &&
> >>>>>> -	    ((val & VMXON_CR0_ALWAYSON) != VMXON_CR0_ALWAYSON))
> >>>>>> -		return 1;
> >>>>>> -
> >>>>>>  	if (is_guest_mode(vcpu)) {
> >>>>>> -		/*
> >>>>>> -		 * We get here when L2 changed cr0 in a way that did not change
> >>>>>> -		 * any of L1's shadowed bits (see nested_vmx_exit_handled_cr),
> >>>>>> -		 * but did change L0 shadowed bits. This can currently happen
> >>>>>> -		 * with the TS bit: L0 may want to leave TS on (for lazy fpu
> >>>>>> -		 * loading) while pretending to allow the guest to change it.
> >>>>>> -		 */
> >>>>> Can't say I understand this patch yet, but it looks like the comment is
> >>>>> still valid. Why have you removed it?
> >>>>
> >>>> L0 allows L1 or L2 at most to own TS, the rest is host-owned. I think
> >>>> the comment was always misleading.
> >>>>
> >>> I do not see how it is misleading. For everything but TS we will not get
> >>> here (if L1 is kvm). For TS we will get here if L1 allows L2 to change
> >>> it, but L0 does not.
> >>
> >> For everything *but guest-owned* we will get here, thus for most CR0
> >> accesses (bit-wise, not regarding frequency).
> >>
> > I do not see how. If bit is trapped by L1 we will not get here. We will
> > do vmexit to L1 instead. nested_vmx_exit_handled_cr() check this condition.
> > I am not arguing about you code (didn't grok it yet), but the comment
> > still make sense to me.
> 
> "We get here when L2 changed cr0 in a way that did not change any of
> L1's shadowed bits (see nested_vmx_exit_handled_cr), but did change L0
> shadowed bits." That I can sign. But the rest about TS is just
> misleading as we trap _every_ change in L0 - except for TS under certain
> conditions. The old code was tested against TS only, that's what the
> comment witness.
> 
TS is just an example of how we can get here with KVM on KVM. Obviously
other hypervisors may have different configuration. L2 may allow full
guest access to CR0 and then each CR0 write by L2 will be handled here.
Under what other condition "we trap _every_ change in L0 - except for
TS" here?

> If you prefer, I'll leave part one in.
> 
Please do so. Without the comment it is not obvious why exit condition
is not checked here. Still do not see why you object to TS part.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html