On 15/02/2017 19:19, Jim Mattson wrote: > On Wed, Feb 15, 2017 at 9:37 AM, Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote: >> Yes, I think it is better. The really nasty part of >> "KVM_GET_NESTED_STATE is not transparent to the guest" is that >> KVM_GET_NESTED_STATE typically happens while the guest is quiescent and >> userspace doesn't expect guest memory to change. (2) is a special case >> of this. >> >> Instead, flushing the VMCS on L2->userspace exit happens during KVM_RUN, >> which should not break any invariants that userspace relies on. > > It is certainly within the bounds of the Intel specification to flush the > VMCS cache at any time, but I think it would be better if the VCPU behaved more > deterministically (with respect to guest-visible events). Yes, it's a trade off. Of course if you think of SMIs on physical hardware, determinism is already thrown away. But I understand that we can do better than physical processors in this regard. :) > Just a proposal. I think a separate vmcs02 has the potential for faster emulated > VM-entry/VM-exit, but at the cost of greater code complexity. A unified vmcs0X > could certainly be as fast as the existing implementation and a lot simpler. I agree. As I said, we need to benchmark it either way. I honestly have no idea where time is spent (on a processor with VMCS shadowing) in a tight loop of L1->L2->L1 entries and exits. >>> To be honest, I'm not sure what it is that you are trying to optimize >>> for. Is your concern the extra 4K of VCPU state? I would argue that >>> the VMCS cache is part of the physical CPU state (though on a physical >>> CPU, not every field is cached), and that it therefore makes sense to >>> include the VMCS cache as part of the VCPU state, apart from guest >>> physical memory. >> >> It's not really the extra 4K, but rather the ugly "blob" aspect of it >> and having to maintain compatibility for it. Regarding the blobbiness, >> see for example KVM_GET/SET_XSAVE, where you can actually marshal and >> unmarshal the contents of the XSAVE area into the registers. Regarding >> compatibility, I'm worried that once someone looks at SVM more closely, >> we will have to store the VMCB in GET_NESTED_STATE too. > > Instead of a "blob," we could pass it out to userspace as a 'struct > vmcs12,' padded out to 4k for compatibility purposes. Then userspace > could marshal and unmarshal the contents, and it would be a little > easier than marshalling and unmarshalling the same contents in guest > physical memory. You mean marshalling/unmarshalling it for stuff such as extracting the EPTP? Yeah, that's a good reason to keep it in GET_NESTED_STATE. OTOH, it's more work to do in userspace. Tradeoffs... > I don't know what to say about SVM. I think we should be prepared to > store the VMCB at some point, even if we don't now. The in-memory VMCB is always consistent with processor state after VMRUN, but not necessarily during it so I agree. Of course the same solution of flushing it out on L2->userspace exit would work, though I understand why you think it's a hack. Paolo