Re: [PATCH 7/8] kvm: nVMX: Introduce KVM_CAP_VMX_STATE

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Certain VMX state cannot be extracted from the kernel today. As you
point out, this includes the vCPU's VMX operating mode {legacy, VMX
root operation, VMX non-root operation}, the current VMCS GPA (if
any), and the VMXON region GPA (if any). Perhaps these could be
appended to the state(s) extracted by one or more existing APIs rather
than introducing a new API, but I think there's sufficient
justification here for a new GET/SET_NESTED_STATE API.

Most L2 guest state can already be extracted by existing APIs, like
GET_SREGS. However, restoring it is a bit problematic today. SET_SREGS
will write into the current VMCS, but we have no existing mechanism
for transferring guest state from vmcs01 to vmcs02. On restore, do we
want to dictate that the vCPU's VMX operating mode has to be restored
before SET_SREGS is called, or do we provide a mechanism for
transferring vmcs01 guest state to vmcs02? If we do dictate that the
vCPU's operating mode has to be restored first, then SET_SREGS will
naturally write into vmcs02, but we'll have to create a mechanism for
building an initial vmcs02 out of nothing.

The only mechanism we have today for building a vmcs02 starts with a
vmcs12. Building on that mechanism, it is fairly straightforward to
write GET/SET_NESTED_STATE. Though there is quite a bit of redundancy
with GET/SET_SREGS, GET_SET/VCPU_EVENTS, etc., if you capture all of
the L2 state in VMCS12 format, you can restore it pretty easily using
the existing infrastructure, without worrying about the ordering of
the SET_* ioctls.

Today, the cached VMCS12 is loaded when the guest executes VMPTRLD,
primarily as a defense against the guest modifying VMCS12 fields in
memory after the hypervisor has checked their validity. There were a
lot of time-of-check to time-of-use security issues before the cached
VMCS12 was introduced. Conveniently, all but the host state of the
cached VMCS12 is dead once the vCPU enters L2, so it seemed like a
reasonable place to stuff the current L2 state for later restoration.
But why pass the cached VMCS12 as a separate vCPU state component
rather than writing it back to guest memory as part of the "save vCPU
state" sequence?

One reason is that it is a bit awkward for GET_NESTED_STATE to modify
guest memory. I don't know about qemu, but our userspace agent expects
guest memory to be quiesced by the time it starts going through its
sequence of GET_* ioctls. Sure, we could introduce a pre-migration
ioctl, but is that the best way to handle this? Another reason is that
it is a bit awkward for SET_NESTED_STATE to require guest memory.
Again, I don't know about qemu, but our userspace agent does not
expect any guest memory to be available when it starts going through
its sequence of SET_* ioctls. Sure, we could prefetch the guest page
containing the current VMCS12, but is that better than simply
including the current VMCS12 in the NESTED_STATE payload? Moreover,
these unpredictable (from the guest's point of view) updates to guest
memory leave a bad taste in my mouth (much like SMM).

Perhaps qemu doesn't have the same limitations that our userspace
agent has, and I can certainly see why you would dismiss my concerns
if you are only interested in qemu as a userspace agent for kvm. At
the same time, I hope you can understand why I am not excited to be
drawn down a path that's going to ultimately mean more headaches for
me in my environment. AFAICT, the proposed API doesn't introduce any
additional headaches for those that use qemu. The principal objections
appear to be the "blob" of data, completely unstructured in the eyes
of the userspace agent, and the redundancy with state already
extracted by existing APIs. Is that right?


On Tue, Dec 19, 2017 at 9:40 AM, David Hildenbrand <david@xxxxxxxxxx> wrote:
> On 19.12.2017 18:33, David Hildenbrand wrote:
>> On 19.12.2017 18:26, Jim Mattson wrote:
>>> Yes, it can be done that way, but what makes this approach technically
>>> superior to the original API?
>>
>> a) not having to migrate data twice
>> b) not having to think about a proper API to get data in/out
>>
>> All you need to know is, if the guest was in nested mode when migrating,
>> no? That would be a simple flag.
>>
>
> (of course in addition, vmcsptr and if vmxon has been called).
>
> But anyhow, if you have good reasons why you want to introduce and
> maintain a new API, feel free to do so. Most likely I am missing
> something important here :)
>
>
> --
>
> Thanks,
>
> David / dhildenb



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux