Re: [PATCH 7/8] kvm: nVMX: Introduce KVM_CAP_VMX_STATE

Jim Mattson <jmattson@xxxxxxxxxx> · Mon, 8 Jan 2018 13:55:14 -0800

On Mon, Jan 8, 2018 at 1:46 PM, David Hildenbrand <david@xxxxxxxxxx> wrote:
>> The vmcs01 does serve as a cache of L1 state at the time of VM-entry,
>> so if we simply restored the vmcs01 state, that would take care of
>> most of the rollback issues, as long as we don't deliver any
>> interrupts in this context. However, I would like to see the
>> vmcs01/vmcs02 separation go away at some point. svm.c seems to do fine
>> with just once VMCB.
>
> Interesting point, might make things easier for VMX.
>
>>
>>> If the page
>>>> tables have changed, or the L1 guest has overwritten the
>>>> VMLAUNCH/VMRESUME instruction, then you're out of luck.
>>>
>>> Page tables getting changed by other CPUs is actually a good point. But
>>> I would consider both as "theoretical" problems. At least compared to
>>> the interrupt stuff, which could also happen on guests behaving in a
>>> more sane way.
>>
>> My preference is for solutions that are architecturally correct,
>> thereby solving the theoretical problems as well as the empirical
>> ones. However, I grant that the Linux community leans the other way in
>> general.
>
> I usually agree, unless it makes the code horribly complicated without
> any real benefit. (e.g. for corner cases like this one: a CPU modifying
> instruction text of another CPU which is currently executing them)

I don't feel that one additional bit of serialized state is horribly
complicated. It's aesthetically unpleasant, to be sure, but not
horribly complicated. And it's considerably less complicated than your
proposal. :-)

>>>> 3. I'm assuming that you're planning to store most of the current L2
>>>> state in the cached VMCS12, at least where you can. Even so, the next
>>>> "VM-entry" can't perform any of the normal VM-entry actions that would
>>>> clobber the current L2 state that isn't in the cached VMCS12 (e.g.
>>>> processing the VM-entry MSR load list). So, you need to have a flag
>>>> indicating that this isn't a real VM-entry. That's no better than
>>>> carrying the nested_run_pending flag.
>>>
>>> Not sure if that would really be necessary (would have to look into the
>>> details first). But sounds like nested_run_pending seems unavoidable on
>>> x86. So I'd better get used to QEMU dealing with nested CPU state (which
>>> is somehow scary to me - an emulator getting involved in nested
>>> execution - what could go wrong :) )
>>
>> For save/restore, QEMU doesn't actually have to know what the flag
>> means. It just has to pass it on. (Our userspace agent doesn't know
>> anything about the meaning of the flags in the kvm_vmx_state
>> structure.)
>
> I agree on save/restore. My comment was rather about involving a L0
> emulator when running a L2 guest. But as Paolo pointed out, this can't
> really be avoided.
>
> --
>
> Thanks,
>
> David / dhildenb