Re: [PATCH 7/8] kvm: nVMX: Introduce KVM_CAP_VMX_STATE

Jim Mattson <jmattson@xxxxxxxxxx> · Mon, 18 Dec 2017 19:07:33 -0800

The other unfortunate thing about flushing the "current" VMCS12 state
to guest memory on each L2->userspace transition is that much of this
state is in the VMCS02. So,it's not just a matter of writing a
VMCS12_SIZE blob to guest memory; first, the cached VMCS12 has to be
updated from the VMCS02 by calling sync_vmcs12(). This will be
particularly bad for double-nesting, where L1 kvm has to take all of
those VMREAD VM-exits.

If you still prefer this method, I will go ahead and do it, but I
remain opposed.

On Wed, Feb 15, 2017 at 1:28 PM, Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote:
>
>
> On 15/02/2017 19:19, Jim Mattson wrote:
>> On Wed, Feb 15, 2017 at 9:37 AM, Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote:
>>> Yes, I think it is better.  The really nasty part of
>>> "KVM_GET_NESTED_STATE is not transparent to the guest" is that
>>> KVM_GET_NESTED_STATE typically happens while the guest is quiescent and
>>> userspace doesn't expect guest memory to change.  (2) is a special case
>>> of this.
>>>
>>> Instead, flushing the VMCS on L2->userspace exit happens during KVM_RUN,
>>> which should not break any invariants that userspace relies on.
>>
>> It is certainly within the bounds of the Intel specification to flush the
>> VMCS cache at any time, but I think it would be better if the VCPU behaved more
>> deterministically (with respect to guest-visible events).
>
> Yes, it's a trade off.  Of course if you think of SMIs on physical
> hardware, determinism is already thrown away.  But I understand that we
> can do better than physical processors in this regard. :)
>
>> Just a proposal. I think a separate vmcs02 has the potential for faster emulated
>> VM-entry/VM-exit, but at the cost of greater code complexity. A unified vmcs0X
>> could certainly be as fast as the existing implementation and a lot simpler.
>
> I agree.  As I said, we need to benchmark it either way.  I honestly
> have no idea where time is spent (on a processor with VMCS shadowing) in
> a tight loop of L1->L2->L1 entries and exits.
>
>>>> To be honest, I'm not sure what it is that you are trying to optimize
>>>> for. Is your concern the extra 4K of VCPU state? I would argue that
>>>> the VMCS cache is part of the physical CPU state (though on a physical
>>>> CPU, not every field is cached), and that it therefore makes sense to
>>>> include the VMCS cache as part of the VCPU state, apart from guest
>>>> physical memory.
>>>
>>> It's not really the extra 4K, but rather the ugly "blob" aspect of it
>>> and having to maintain compatibility for it.  Regarding the blobbiness,
>>> see for example KVM_GET/SET_XSAVE, where you can actually marshal and
>>> unmarshal the contents of the XSAVE area into the registers.  Regarding
>>> compatibility, I'm worried that once someone looks at SVM more closely,
>>> we will have to store the VMCB in GET_NESTED_STATE too.
>>
>> Instead of a "blob," we could pass it out to userspace as a 'struct
>> vmcs12,' padded out to 4k for compatibility purposes. Then userspace
>> could marshal and unmarshal the contents, and it would be a little
>> easier than marshalling and unmarshalling the same contents in guest
>> physical memory.
>
> You mean marshalling/unmarshalling it for stuff such as extracting the
> EPTP?  Yeah, that's a good reason to keep it in GET_NESTED_STATE.
>
> OTOH, it's more work to do in userspace.  Tradeoffs...
>
>> I don't know what to say about SVM. I think we should be prepared to
>> store the VMCB at some point, even if we don't now.
>
> The in-memory VMCB is always consistent with processor state after
> VMRUN, but not necessarily during it so I agree.  Of course the same
> solution of flushing it out on L2->userspace exit would work, though I
> understand why you think it's a hack.
>
> Paolo