Re: nVMX: Shadowing of CPU_BASED_VM_EXEC_CONTROL broken

Wanpeng Li <wanpeng.li@xxxxxxxxxxxxxxx> · Thu, 9 Oct 2014 07:58:08 +0800

On Thu, Oct 09, 2014 at 07:34:47AM +0800, Wanpeng Li wrote:
>On Wed, Oct 08, 2014 at 05:07:48PM +0200, Jan Kiszka wrote:
>>On 2014-10-08 12:34, Paolo Bonzini wrote:
>>> Il 08/10/2014 12:29, Jan Kiszka ha scritto:
>>>>>> But it would write to the vmcs02, not to the shadow VMCS; the shadow
>>>>>> VMCS is active during copy_shadow_to_vmcs12/copy_vmcs12_to_shadow, and
>>>>>> at no other time.  It is not clear to me how the VIRTUAL_INTR_PENDING
>>>>>> bit ended up from the vmcs02 (where it is perfectly fine) to the vmcs12.
>>>> Well, but somehow that bit ends up in vmcs12, that's a fact. Also that
>>>> the proble disappears when shadowing is disabled. Need to think about
>>>> the path again. Maybe there is just a bug, not a conceptual issue.
>>> 
>>> Yeah, and at this point we cannot actually exclude a processor bug.  Can
>>> you check that the bit is not in the shadow VMCS just before vmrun, or
>>> just after enable_irq_window?
>>> 
>>> Having a kvm-unit-tests testcase could also be of some help.
>>
>>As usual, this was a nasty race that involved some concurrent VCPUs and
>>proper host load, so hard to write unit tests...
>>
>>diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>index 04fa1b8..d6bcaca 100644
>>--- a/arch/x86/kvm/vmx.c
>>+++ b/arch/x86/kvm/vmx.c
>>@@ -6417,6 +6417,8 @@ static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx)
>> 	const unsigned long *fields = shadow_read_write_fields;
>> 	const int num_fields = max_shadow_read_write_fields;
>> 
>>+	preempt_disable();
>>+
>> 	vmcs_load(shadow_vmcs);
>> 
>> 	for (i = 0; i < num_fields; i++) {
>>@@ -6440,6 +6442,8 @@ static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx)
>> 
>> 	vmcs_clear(shadow_vmcs);
>> 	vmcs_load(vmx->loaded_vmcs->vmcs);
>>+
>>+	preempt_enable();
>> }
>> 
>> static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
>>@@ -6457,6 +6461,8 @@ static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
>> 	u64 field_value = 0;
>> 	struct vmcs *shadow_vmcs = vmx->nested.current_shadow_vmcs;
>> 
>>+	preempt_disable();
>>+
>> 	vmcs_load(shadow_vmcs);
>> 
>> 	for (q = 0; q < ARRAY_SIZE(fields); q++) {
>>@@ -6483,6 +6489,8 @@ static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
>> 
>> 	vmcs_clear(shadow_vmcs);
>> 	vmcs_load(vmx->loaded_vmcs->vmcs);
>>+
>>+	preempt_enable();
>> }
>> 
>> /*
>>
>>No proper patch yet because there might be a smarter approach without
>>using the preempt_disable() hammer. But the point is that we temporarily
>>load a vmcs without updating loaded_vmcs->vmcs. Now, if some other VCPU
>>is scheduling in right in the middle of this, the wrong vmcs will be
>>flushed and then reloaded - e.g. a non-shadow vmcs with that interrupt
>>window flag set...
>
>If non-shadow vmcs and shadow vmcs can present in one system simultaneously? 

Ah, got it, you mean non-current-shadow vmcs.

Regards,
Wanpeng Li 

>
>Regards,
>Wanpeng Li 
>
>>
>>Patch is currently under heavy load testing here, but it looks very good
>>as the bug was quickly reproducible before I applied it.
>>
>>Jan
>>
>>-- 
>>Siemens AG, Corporate Technology, CT RTC ITP SES-DE
>>Corporate Competence Center Embedded Linux
>>--
>>To unsubscribe from this list: send the line "unsubscribe kvm" in
>>the body of a message to majordomo@xxxxxxxxxxxxxxx
>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>--
>To unsubscribe from this list: send the line "unsubscribe kvm" in
>the body of a message to majordomo@xxxxxxxxxxxxxxx
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html