Re: [PATCH 7/8] kvm: nVMX: Introduce KVM_CAP_VMX_STATE

Paolo Bonzini <pbonzini@xxxxxxxxxx> · Wed, 15 Feb 2017 22:28:17 +0100

On 15/02/2017 19:19, Jim Mattson wrote:
> On Wed, Feb 15, 2017 at 9:37 AM, Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote:
>> Yes, I think it is better.  The really nasty part of
>> "KVM_GET_NESTED_STATE is not transparent to the guest" is that
>> KVM_GET_NESTED_STATE typically happens while the guest is quiescent and
>> userspace doesn't expect guest memory to change.  (2) is a special case
>> of this.
>>
>> Instead, flushing the VMCS on L2->userspace exit happens during KVM_RUN,
>> which should not break any invariants that userspace relies on.
> 
> It is certainly within the bounds of the Intel specification to flush the
> VMCS cache at any time, but I think it would be better if the VCPU behaved more
> deterministically (with respect to guest-visible events).

Yes, it's a trade off.  Of course if you think of SMIs on physical
hardware, determinism is already thrown away.  But I understand that we
can do better than physical processors in this regard. :)

> Just a proposal. I think a separate vmcs02 has the potential for faster emulated
> VM-entry/VM-exit, but at the cost of greater code complexity. A unified vmcs0X
> could certainly be as fast as the existing implementation and a lot simpler.

I agree.  As I said, we need to benchmark it either way.  I honestly
have no idea where time is spent (on a processor with VMCS shadowing) in
a tight loop of L1->L2->L1 entries and exits.

>>> To be honest, I'm not sure what it is that you are trying to optimize
>>> for. Is your concern the extra 4K of VCPU state? I would argue that
>>> the VMCS cache is part of the physical CPU state (though on a physical
>>> CPU, not every field is cached), and that it therefore makes sense to
>>> include the VMCS cache as part of the VCPU state, apart from guest
>>> physical memory.
>>
>> It's not really the extra 4K, but rather the ugly "blob" aspect of it
>> and having to maintain compatibility for it.  Regarding the blobbiness,
>> see for example KVM_GET/SET_XSAVE, where you can actually marshal and
>> unmarshal the contents of the XSAVE area into the registers.  Regarding
>> compatibility, I'm worried that once someone looks at SVM more closely,
>> we will have to store the VMCB in GET_NESTED_STATE too.
> 
> Instead of a "blob," we could pass it out to userspace as a 'struct
> vmcs12,' padded out to 4k for compatibility purposes. Then userspace
> could marshal and unmarshal the contents, and it would be a little
> easier than marshalling and unmarshalling the same contents in guest
> physical memory.

You mean marshalling/unmarshalling it for stuff such as extracting the
EPTP?  Yeah, that's a good reason to keep it in GET_NESTED_STATE.

OTOH, it's more work to do in userspace.  Tradeoffs...

> I don't know what to say about SVM. I think we should be prepared to
> store the VMCB at some point, even if we don't now.

The in-memory VMCB is always consistent with processor state after
VMRUN, but not necessarily during it so I agree.  Of course the same
solution of flushing it out on L2->userspace exit would work, though I
understand why you think it's a hack.

Paolo