Re: [PATCH 3/3] KVM: nVMX: check for null vmcs12 when L1 does invept

Bandan Das <bsd@xxxxxxxxxx> · Thu, 27 Mar 2014 18:14:22 -0400

Jan Kiszka <jan.kiszka@xxxxxx> writes:

> On 2014-03-26 21:22, Bandan Das wrote:
>> Jan Kiszka <jan.kiszka@xxxxxx> writes:
>> 
>>> On 2014-03-22 17:43, Bandan Das wrote:
>>>> Jan Kiszka <jan.kiszka@xxxxxx> writes:
>>>>
>>>>> On 2014-03-20 21:58, Bandan Das wrote:
>>>>>> Jan Kiszka <jan.kiszka@xxxxxxxxxxx> writes:
>>>>>>
>>>>>>> On 2014-03-20 04:28, Bandan Das wrote:
>>>>>>>> Some L1 hypervisors such as Xen seem to be calling invept after
>>>>>>>> vmclear or before vmptrld on L2. In this case, proceed with
>>>>>>>> falling through and syncing roots as a case where
>>>>>>>> context wide invalidation can't be supported
>>>>>>>
>>>>>>> Can we also base this behaviour on a statement in the SDM? But on first
>>>>>>> glance, I do not find anything like this over there.
>>>>>>
>>>>>> The SDM has nothing of this sort explicitly mentioned but 28.3.3.1 
>>>>>> "Operations that invalidate Cached Mappings" does mention that
>>>>>> the instruction may invalidate mappings associated with other
>>>>>> EP4TAs (even in single context).
>>>>>
>>>>> Yes, "may". So we are implementing undefined behavior in order to please
>>>>> a broken hypervisor that relies on it? Then please state this in the
>>>>> patch and probably also inform Xen about their issue.
>>>>
>>>> Why undefined behavior ? We don't do anything specific for 
>>>> the single context invalidation case ianyway .e If the eptp matches what 
>>>> vmcs12 has, single context invalidation does fall though to the global 
>>>> invalidation case already. All this change does is add the "L1 calls 
>>>> invept after vmclear and  before vmptrld" to the list of cases to fall 
>>>> though to global invalidation since nvmx doesn't have any knowledge of 
>>>> the current eptp for this case.
>>>
>>> OK, I think I misunderstood what the guest expects and how we currently
>>> achieve this: we do not track the mapping between guest and host eptp,
>>> thus cannot properly emulate its behaviour. We therefore need to flush
>>> everything.
>>>
>>>>
>>>> Or do you think we should rethink this approach ?
>>>
>>> Well, I wonder if we should expose single-context invept support at all.
>>>
>>> I'm also wondering if we are returning proper flags on
>>>
>>>     if ((operand.eptp & eptp_mask) !=
>>>                     (nested_ept_get_cr3(vcpu) & eptp_mask))
>>>             break;
>> 
>> That does sound plausible but then we would have to get rid of this 
>> little optimization in the code you have quoted above. We would have
>> to flush and sync roots for all invept calls. So, my preference is to 
>> keep it.
>
> Do you have any numbers on how many per-context invept calls we are able
> to ignore? At least for KVM this is should be 0 as we only flush on
> changes to the current EPT structures.

Just instrumented a Debian L2 bootup under L1 Xen and there were 0 
per-context invept calls (with a valid vmcs12). I will spin out a 
version for testing where we don't advertise single context invept and 
if everything works, put a comment as to why we are removing support.

Thanks,
Bandan

> Jan
>
>> 
>>> Neither nested_vmx_succeed nor nested_vmx_fail* is called if this
>>> condition evaluates to true.
>> 
>> It appears we should always call nested_vmx_succeed(). If you are ok,
>> I will send a v2.
>> 
>> Thanks,
>> Bandan
>> 
>>> Jan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html