Re: [PATCH] KVM: nVMX: Fix setting of CR0 and CR4 in guest mode

Jan Kiszka <jan.kiszka@xxxxxxxxxxx> · Mon, 04 Mar 2013 17:01:47 +0100

On 2013-03-04 16:30, Nadav Har'El wrote:
> On Mon, Mar 04, 2013, Jan Kiszka wrote about "Re: [PATCH] KVM: nVMX: Fix setting of CR0 and CR4 in guest mode":
>>>>>>  	if (is_guest_mode(vcpu)) {
>>>>>> -		/*
>>>>>> -		 * We get here when L2 changed cr0 in a way that did not change
>>>>>> -		 * any of L1's shadowed bits (see nested_vmx_exit_handled_cr),
>>>>>> -		 * but did change L0 shadowed bits. This can currently happen
>>>>>> -		 * with the TS bit: L0 may want to leave TS on (for lazy fpu
>>>>>> -		 * loading) while pretending to allow the guest to change it.
>>>>>> -		 */
>>>>> Can't say I understand this patch yet, but it looks like the comment is
>>>>> still valid. Why have you removed it?
>>>>
>>>> L0 allows L1 or L2 at most to own TS, the rest is host-owned. I think
>>>> the comment was always misleading.
>>>>
>>> I do not see how it is misleading. For everything but TS we will not get
>>> here (if L1 is kvm). For TS we will get here if L1 allows L2 to change
>>> it, but L0 does not.
>>
>> For everything *but guest-owned* we will get here, thus for most CR0
>> accesses (bit-wise, not regarding frequency).
> 
> For most CR0 bits, L1 (at least, a KVM one) will shadow (trap) them, so
> we won't get to this point you modified at all... Instead,
> nested_vmx_exit_handled_cr() would notice that a shadowed-by-L1 bit
> was modified so an exit to L1 is required. We only get to that code
> you changed if a bit was modified that L1 did *not* want to trap, but L0 did.
> This is definitely not the bit-wise majority of the cases - unless you
> have an L1 that does not trap most of the CR0 bits.
> 
> But I'm more worried about the actual code change :-) I didn't
> understand if there's a situation where the existing code did something
> wrong, or why it was wrong. Did you check the lazy-FPU-loading (TS bit)
> aspect of your new code? To effectively check this, what I had to do
> is to run on all of L0, L1, and L2, long runs of parallel "make" (make -j3) -
> concurrently. Even code which doesn't do floating-point calculations uses
> the FPU sometimes for its wide registers, so all these processes, guests
> and guest's guests, compete for the FPU, exercising very well this code
> path. If the TS bit is handled wrongly, some of these make processes
> will die, when one of the compilations dies of SIGSEGV (forgetting to
> set the FPU registers leads to some uninitialized pointers being used),
> so it's quite easy to exercise this.

I'm not focusing on hosting KVM but arbitrary guests. So I looked at it
generically, defining what bits should L1 effectively hand over to L0,
like real hardware would do when setting CR0/4. And that value was
wrongly calculated, breaking in practice when you allow unrestricted
guest mode, ie. when L2 starts playing with PE and PG - just to name one
prominent scenario. By focusing on TS for KVM-on-KVM without
unrestricted guest mode, you weren't able to trigger this.

But if you can tell me, where I may pass wrong values to kvm_set_cr0/4,
I'm all ears.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html