On Mon, Mar 04, 2013 at 08:40:29PM +0100, Jan Kiszka wrote: > The logic for calculating the value with which we call kvm_set_cr0/4 was > broken (will definitely be visible with nested unrestricted guest mode > support). Also, we performed the check regarding CR0_ALWAYSON too early > when in guest mode. > > What really needs to be done on both CR0 and CR4 is to mask out L1-owned > bits and merge them in from GUEST_CR0/4. In contrast, arch.cr0/4 and > arch.cr0/4_guest_owned_bits contain the mangled L0+L1 state and, thus, > are not suited as input. > > For both CRs, we can then apply the check against VMXON_CRx_ALWAYSON and > refuse the update if it fails. To be fully consistent, we implement this > check now also for CR4. > > Finally, we have to set the shadow to the value L2 wanted to write > originally. > > Signed-off-by: Jan Kiszka <jan.kiszka@xxxxxxxxxxx> > --- > > Changes in v2: > - keep the non-misleading part of the comment in handle_set_cr0 > > arch/x86/kvm/vmx.c | 46 +++++++++++++++++++++++++++++++--------------- > 1 files changed, 31 insertions(+), 15 deletions(-) > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index 7cc566b..832b7b4 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -4605,37 +4605,53 @@ vmx_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall) > /* called to set cr0 as appropriate for a mov-to-cr0 exit. */ > static int handle_set_cr0(struct kvm_vcpu *vcpu, unsigned long val) > { > - if (to_vmx(vcpu)->nested.vmxon && > - ((val & VMXON_CR0_ALWAYSON) != VMXON_CR0_ALWAYSON)) > - return 1; > - > if (is_guest_mode(vcpu)) { > + struct vmcs12 *vmcs12 = get_vmcs12(vcpu); > + unsigned long orig_val = val; > + > /* > * We get here when L2 changed cr0 in a way that did not change > * any of L1's shadowed bits (see nested_vmx_exit_handled_cr), > - * but did change L0 shadowed bits. This can currently happen > - * with the TS bit: L0 may want to leave TS on (for lazy fpu > - * loading) while pretending to allow the guest to change it. > + * but did change L0 shadowed bits. > */ > - if (kvm_set_cr0(vcpu, (val & vcpu->arch.cr0_guest_owned_bits) | > - (vcpu->arch.cr0 & ~vcpu->arch.cr0_guest_owned_bits))) > + val = (val & ~vmcs12->cr0_guest_host_mask) | > + (vmcs_read64(GUEST_CR0) & vmcs12->cr0_guest_host_mask); I think using GUEST_CR0 here is incorrect. It contains combination of bits set by L2, L1 and L0 and here we need to get only L2/L1 mix which is in vcpu->arch.cr0 (almost, but good enough for this case). Why vcpu->arch.cr0 contains right L2/L1 mix? Because it was set to vmcs12->guest_cr0 during L2 #vmentry. While L2 is running three things may happen to CR0: 1. L2 writes to a bit that is not shadowed neither by L1 nor by L0. It will go strait to GUEST_CR0. 2. L2 writes to a bit shadowed by L1. L1 #vmexit will be emulated. On the next #vmetry vcpu->arch.cr0 will be set to whatever value L1 calculated. 3. L2 writes to a bit shadowed by L0, but not L1. This is the case we are handling here. And if we will do it right vcpu->arch.cr0 will be up-to-date at the end. The only case when, while this code running, vcpu->arch.cr0 has not up-to-date value is if 1 happened, but since L2 guest overwriting cr0 here anyway it does not matter what it previously set in GUEST_CR0. The correct bits are in a new cr0 value provided by val and accessible by (val & ~vmcs12->cr0_guest_host_mask). -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html