Re: [PATCH 7/8] kvm: nVMX: Introduce KVM_CAP_VMX_STATE

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 8, 2018 at 1:19 PM, David Hildenbrand <david@xxxxxxxxxx> wrote:
> On 08.01.2018 21:59, Jim Mattson wrote:
>> On Mon, Jan 8, 2018 at 12:27 PM, David Hildenbrand <david@xxxxxxxxxx> wrote:
>>> On 08.01.2018 21:19, Jim Mattson wrote:
>>>> Even more trivially, what if the L2 VM is configured never to leave
>>>> VMX non-root operation? Then we never exit to userspace?
>>>
>>> Well we would make the PC point at the VMLAUNCH. Then exit to userspace.
>>
>> That doesn't work, for so many reasons.
>>
>> 1. It's not sufficient to just rollback the instruction pointer. You
>> also need to rollback CS, CR0, CR3 (and possibly the PDPTEs), and CR4,
>> so that the virtual address of the instruction pointer will actually
>> map to the same physical address as it did the first time.
>
> I expect these values to be the same once leaving non-root mode (as the
> CPU itself hasn't executed anything except the nested guest) But yes, it
> could be tricky.

The vmcs01 does serve as a cache of L1 state at the time of VM-entry,
so if we simply restored the vmcs01 state, that would take care of
most of the rollback issues, as long as we don't deliver any
interrupts in this context. However, I would like to see the
vmcs01/vmcs02 separation go away at some point. svm.c seems to do fine
with just once VMCB.

> If the page
>> tables have changed, or the L1 guest has overwritten the
>> VMLAUNCH/VMRESUME instruction, then you're out of luck.
>
> Page tables getting changed by other CPUs is actually a good point. But
> I would consider both as "theoretical" problems. At least compared to
> the interrupt stuff, which could also happen on guests behaving in a
> more sane way.

My preference is for solutions that are architecturally correct,
thereby solving the theoretical problems as well as the empirical
ones. However, I grant that the Linux community leans the other way in
general.

>> 2. As you point out, interrupts are a problem. Interrupts can't be
>> delivered in this context, because the vCPU shouldn't be in this
>> context (and the guest may have already observed the transition to
>> L2).
>
> Yes, I also see this as the major problem.
>
>> 3. I'm assuming that you're planning to store most of the current L2
>> state in the cached VMCS12, at least where you can. Even so, the next
>> "VM-entry" can't perform any of the normal VM-entry actions that would
>> clobber the current L2 state that isn't in the cached VMCS12 (e.g.
>> processing the VM-entry MSR load list). So, you need to have a flag
>> indicating that this isn't a real VM-entry. That's no better than
>> carrying the nested_run_pending flag.
>
> Not sure if that would really be necessary (would have to look into the
> details first). But sounds like nested_run_pending seems unavoidable on
> x86. So I'd better get used to QEMU dealing with nested CPU state (which
> is somehow scary to me - an emulator getting involved in nested
> execution - what could go wrong :) )

For save/restore, QEMU doesn't actually have to know what the flag
means. It just has to pass it on. (Our userspace agent doesn't know
anything about the meaning of the flags in the kvm_vmx_state
structure.)

> Good we talked about it (and thanks for your time). I learned a lot today!
>
>>>
>>> --
>>>
>>> Thanks,
>>>
>>> David / dhildenb
>
>
> --
>
> Thanks,
>
> David / dhildenb



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux