On Thu, Mar 07, 2013 at 11:37:43AM +0100, Jan Kiszka wrote: > On 2013-03-07 09:57, Gleb Natapov wrote: > > On Thu, Mar 07, 2013 at 09:53:49AM +0100, Jan Kiszka wrote: > >> On 2013-03-07 09:43, Gleb Natapov wrote: > >>> On Thu, Mar 07, 2013 at 09:12:19AM +0100, Jan Kiszka wrote: > >>>> On 2013-03-07 08:51, Gleb Natapov wrote: > >>>>> On Mon, Mar 04, 2013 at 08:40:29PM +0100, Jan Kiszka wrote: > >>>>>> The logic for calculating the value with which we call kvm_set_cr0/4 was > >>>>>> broken (will definitely be visible with nested unrestricted guest mode > >>>>>> support). Also, we performed the check regarding CR0_ALWAYSON too early > >>>>>> when in guest mode. > >>>>>> > >>>>>> What really needs to be done on both CR0 and CR4 is to mask out L1-owned > >>>>>> bits and merge them in from GUEST_CR0/4. In contrast, arch.cr0/4 and > >>>>>> arch.cr0/4_guest_owned_bits contain the mangled L0+L1 state and, thus, > >>>>>> are not suited as input. > >>>>>> > >>>>>> For both CRs, we can then apply the check against VMXON_CRx_ALWAYSON and > >>>>>> refuse the update if it fails. To be fully consistent, we implement this > >>>>>> check now also for CR4. > >>>>>> > >>>>>> Finally, we have to set the shadow to the value L2 wanted to write > >>>>>> originally. > >>>>>> > >>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@xxxxxxxxxxx> > >>>>>> --- > >>>>>> > >>>>>> Changes in v2: > >>>>>> - keep the non-misleading part of the comment in handle_set_cr0 > >>>>>> > >>>>>> arch/x86/kvm/vmx.c | 46 +++++++++++++++++++++++++++++++--------------- > >>>>>> 1 files changed, 31 insertions(+), 15 deletions(-) > >>>>>> > >>>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > >>>>>> index 7cc566b..832b7b4 100644 > >>>>>> --- a/arch/x86/kvm/vmx.c > >>>>>> +++ b/arch/x86/kvm/vmx.c > >>>>>> @@ -4605,37 +4605,53 @@ vmx_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall) > >>>>>> /* called to set cr0 as appropriate for a mov-to-cr0 exit. */ > >>>>>> static int handle_set_cr0(struct kvm_vcpu *vcpu, unsigned long val) > >>>>>> { > >>>>>> - if (to_vmx(vcpu)->nested.vmxon && > >>>>>> - ((val & VMXON_CR0_ALWAYSON) != VMXON_CR0_ALWAYSON)) > >>>>>> - return 1; > >>>>>> - > >>>>>> if (is_guest_mode(vcpu)) { > >>>>>> + struct vmcs12 *vmcs12 = get_vmcs12(vcpu); > >>>>>> + unsigned long orig_val = val; > >>>>>> + > >>>>>> /* > >>>>>> * We get here when L2 changed cr0 in a way that did not change > >>>>>> * any of L1's shadowed bits (see nested_vmx_exit_handled_cr), > >>>>>> - * but did change L0 shadowed bits. This can currently happen > >>>>>> - * with the TS bit: L0 may want to leave TS on (for lazy fpu > >>>>>> - * loading) while pretending to allow the guest to change it. > >>>>>> + * but did change L0 shadowed bits. > >>>>>> */ > >>>>>> - if (kvm_set_cr0(vcpu, (val & vcpu->arch.cr0_guest_owned_bits) | > >>>>>> - (vcpu->arch.cr0 & ~vcpu->arch.cr0_guest_owned_bits))) > >>>>>> + val = (val & ~vmcs12->cr0_guest_host_mask) | > >>>>>> + (vmcs_read64(GUEST_CR0) & vmcs12->cr0_guest_host_mask); > >>>>> I think using GUEST_CR0 here is incorrect. It contains combination of bits > >>>>> set by L2, L1 and L0 and here we need to get only L2/L1 mix which is in > >>>>> vcpu->arch.cr0 (almost, but good enough for this case). Why vcpu->arch.cr0 > >>>>> contains right L2/L1 mix? > >>>> > >>>> L0/L1. E.g., kvm_set_cr0 unconditionally injects X86_CR0_ET and masks > >>>> out reserved bits. But you are right, GUEST_CR0 isn't much better. And > >>>> maybe that mangling in kvm_set_cr0 is a corner case we can ignore. > >>>> > >>> I think we can. ET is R/O and wired to 1, so it does not matter what > >>> guest writes there it should be treated as 1. About reserved bits spec > >>> says that software should write what it reads there and does not specify > >>> what happens if software does not follow this. > >>> > >>>>> Because it was set to vmcs12->guest_cr0 during > >>>>> L2 #vmentry. While L2 is running three things may happen to CR0: > >>>>> > >>>>> 1. L2 writes to a bit that is not shadowed neither by L1 nor by L0. It > >>>>> will go strait to GUEST_CR0. > >>>>> 2. L2 writes to a bit shadowed by L1. L1 #vmexit will be emulated. On the > >>>>> next #vmetry vcpu->arch.cr0 will be set to whatever value L1 calculated. > >>>>> 3. L2 writes to a bit shadowed by L0, but not L1. This is the case we > >>>>> are handling here. And if we will do it right vcpu->arch.cr0 will be > >>>>> up-to-date at the end. > >>>>> > >>>>> The only case when, while this code running, vcpu->arch.cr0 has not > >>>>> up-to-date value is if 1 happened, but since L2 guest overwriting cr0 > >>>>> here anyway it does not matter what it previously set in GUEST_CR0. The > >>>>> correct bits are in a new cr0 value provided by val and accessible by > >>>>> (val & ~vmcs12->cr0_guest_host_mask). > >>>> > >>>> I need to think about it again. Maybe vmcs12->guest_cr0 is best, but > >>>> that's a shot from the hips now. > >>>> > >>> I do not think it is correct because case 3 does not update it. So if 3 > >>> happens twice without L1 #vmexit between then vmcs12->guest_cr0 will be > >>> outdated. > >> > >> Again, the only thing that matters here is L1's, not L0's view on the > >> "real" CR0 value. So guest_cr0 is never outdated (/wrt > >> cr0_guest_host_mask) as it will be updated by L1 in step 2. Even if > >> arch.cr0 vs. guest_cr0 makes no difference in practice, the latter is > >> more consistent, so I will go for it unless you can convince me it is wrong. > >> > > Hmm, yes you are right that wrt cr0_guest_host_mask guest_cr0 should be > > up-to-date. Please write a big comment about it. > > Will do. > > > And what about moving VMXON_CR0_ALWAYSON check into vmx_set_cr0()? > > That doesn't make much sense for CR0 (due to the differences between > vmxon and guest mode - and lacking return code of set_cr4). But I can > consolidate the CR4 checks. > Isn't vmxon check is implicit in a guest mode. i.e if is_guest_mode() is trues then vmxon is on? Return code can be added. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html