On Mon, Mar 04, 2013 at 08:23:52PM +0100, Jan Kiszka wrote: > On 2013-03-04 19:39, Gleb Natapov wrote: > > On Mon, Mar 04, 2013 at 07:08:08PM +0100, Jan Kiszka wrote: > >> On 2013-03-04 18:56, Gleb Natapov wrote: > >>> On Mon, Mar 04, 2013 at 03:25:47PM +0100, Jan Kiszka wrote: > >>>> On 2013-03-04 15:15, Gleb Natapov wrote: > >>>>> On Mon, Mar 04, 2013 at 03:09:51PM +0100, Jan Kiszka wrote: > >>>>>> On 2013-03-04 14:22, Gleb Natapov wrote: > >>>>>>> On Thu, Feb 28, 2013 at 10:44:47AM +0100, Jan Kiszka wrote: > >>>>>>>> The logic for calculating the value with which we call kvm_set_cr0/4 was > >>>>>>>> broken (will definitely be visible with nested unrestricted guest mode > >>>>>>>> support). Also, we performed the check regarding CR0_ALWAYSON too early > >>>>>>>> when in guest mode. > >>>>>>>> > >>>>>>>> What really needs to be done on both CR0 and CR4 is to mask out L1-owned > >>>>>>>> bits and merge them in from GUEST_CR0/4. In contrast, arch.cr0/4 and > >>>>>>>> arch.cr0/4_guest_owned_bits contain the mangled L0+L1 state and, thus, > >>>>>>>> are not suited as input. > >>>>>>>> > >>>>>>>> For both CRs, we can then apply the check against VMXON_CRx_ALWAYSON and > >>>>>>>> refuse the update if it fails. To be fully consistent, we implement this > >>>>>>>> check now also for CR4. > >>>>>>>> > >>>>>>>> Finally, we have to set the shadow to the value L2 wanted to write > >>>>>>>> originally. > >>>>>>>> > >>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@xxxxxxxxxxx> > >>>>>>>> --- > >>>>>>>> > >>>>>>>> Found while making unrestricted guest mode working. Not sure what impact > >>>>>>>> the bugs had on current feature level, if any. > >>>>>>>> > >>>>>>>> For interested folks, I've pushed my nEPT environment here: > >>>>>>>> > >>>>>>>> git://git.kiszka.org/linux-kvm.git nept-hacking > >>>>>>>> > >>>>>>>> arch/x86/kvm/vmx.c | 49 ++++++++++++++++++++++++++++++------------------- > >>>>>>>> 1 files changed, 30 insertions(+), 19 deletions(-) > >>>>>>>> > >>>>>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > >>>>>>>> index 7cc566b..d1dac08 100644 > >>>>>>>> --- a/arch/x86/kvm/vmx.c > >>>>>>>> +++ b/arch/x86/kvm/vmx.c > >>>>>>>> @@ -4605,37 +4605,48 @@ vmx_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall) > >>>>>>>> /* called to set cr0 as appropriate for a mov-to-cr0 exit. */ > >>>>>>>> static int handle_set_cr0(struct kvm_vcpu *vcpu, unsigned long val) > >>>>>>>> { > >>>>>>>> - if (to_vmx(vcpu)->nested.vmxon && > >>>>>>>> - ((val & VMXON_CR0_ALWAYSON) != VMXON_CR0_ALWAYSON)) > >>>>>>>> - return 1; > >>>>>>>> - > >>>>>>>> if (is_guest_mode(vcpu)) { > >>>>>>>> - /* > >>>>>>>> - * We get here when L2 changed cr0 in a way that did not change > >>>>>>>> - * any of L1's shadowed bits (see nested_vmx_exit_handled_cr), > >>>>>>>> - * but did change L0 shadowed bits. This can currently happen > >>>>>>>> - * with the TS bit: L0 may want to leave TS on (for lazy fpu > >>>>>>>> - * loading) while pretending to allow the guest to change it. > >>>>>>>> - */ > >>>>>>> Can't say I understand this patch yet, but it looks like the comment is > >>>>>>> still valid. Why have you removed it? > >>>>>> > >>>>>> L0 allows L1 or L2 at most to own TS, the rest is host-owned. I think > >>>>>> the comment was always misleading. > >>>>>> > >>>>> I do not see how it is misleading. For everything but TS we will not get > >>>>> here (if L1 is kvm). For TS we will get here if L1 allows L2 to change > >>>>> it, but L0 does not. > >>>> > >>>> For everything *but guest-owned* we will get here, thus for most CR0 > >>>> accesses (bit-wise, not regarding frequency). > >>>> > >>> I do not see how. If bit is trapped by L1 we will not get here. We will > >>> do vmexit to L1 instead. nested_vmx_exit_handled_cr() check this condition. > >>> I am not arguing about you code (didn't grok it yet), but the comment > >>> still make sense to me. > >> > >> "We get here when L2 changed cr0 in a way that did not change any of > >> L1's shadowed bits (see nested_vmx_exit_handled_cr), but did change L0 > >> shadowed bits." That I can sign. But the rest about TS is just > >> misleading as we trap _every_ change in L0 - except for TS under certain > >> conditions. The old code was tested against TS only, that's what the > >> comment witness. > >> > > TS is just an example of how we can get here with KVM on KVM. Obviously > > other hypervisors may have different configuration. L2 may allow full > > guest access to CR0 and then each CR0 write by L2 will be handled here. > > Under what other condition "we trap _every_ change in L0 - except for > > TS" here? > > On FPU activation: > > cr0_guest_owned_bits = X86_CR0_TS; > > And on FPU deactivation: > > cr0_guest_owned_bits = 0; > That's exactly TS case that comment explains. Note that CR0_GUEST_HOST_MASK = ~cr0_guest_owned_bits. > > > >> If you prefer, I'll leave part one in. > >> > > Please do so. Without the comment it is not obvious why exit condition > > is not checked here. Still do not see why you object to TS part. > > It describes a corner case in a way that suggests this is the only > reason why we get here. > For KVM on KVM it is. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html