Re: [PATCH 7/8] kvm: nVMX: Introduce KVM_CAP_VMX_STATE

David Hildenbrand <david@xxxxxxxxxx> · Mon, 8 Jan 2018 22:46:26 +0100

> The vmcs01 does serve as a cache of L1 state at the time of VM-entry,
> so if we simply restored the vmcs01 state, that would take care of
> most of the rollback issues, as long as we don't deliver any
> interrupts in this context. However, I would like to see the
> vmcs01/vmcs02 separation go away at some point. svm.c seems to do fine
> with just once VMCB.

Interesting point, might make things easier for VMX.

> 
>> If the page
>>> tables have changed, or the L1 guest has overwritten the
>>> VMLAUNCH/VMRESUME instruction, then you're out of luck.
>>
>> Page tables getting changed by other CPUs is actually a good point. But
>> I would consider both as "theoretical" problems. At least compared to
>> the interrupt stuff, which could also happen on guests behaving in a
>> more sane way.
> 
> My preference is for solutions that are architecturally correct,
> thereby solving the theoretical problems as well as the empirical
> ones. However, I grant that the Linux community leans the other way in
> general.

I usually agree, unless it makes the code horribly complicated without
any real benefit. (e.g. for corner cases like this one: a CPU modifying
instruction text of another CPU which is currently executing them)

>>> 3. I'm assuming that you're planning to store most of the current L2
>>> state in the cached VMCS12, at least where you can. Even so, the next
>>> "VM-entry" can't perform any of the normal VM-entry actions that would
>>> clobber the current L2 state that isn't in the cached VMCS12 (e.g.
>>> processing the VM-entry MSR load list). So, you need to have a flag
>>> indicating that this isn't a real VM-entry. That's no better than
>>> carrying the nested_run_pending flag.
>>
>> Not sure if that would really be necessary (would have to look into the
>> details first). But sounds like nested_run_pending seems unavoidable on
>> x86. So I'd better get used to QEMU dealing with nested CPU state (which
>> is somehow scary to me - an emulator getting involved in nested
>> execution - what could go wrong :) )
> 
> For save/restore, QEMU doesn't actually have to know what the flag
> means. It just has to pass it on. (Our userspace agent doesn't know
> anything about the meaning of the flags in the kvm_vmx_state
> structure.)

I agree on save/restore. My comment was rather about involving a L0
emulator when running a L2 guest. But as Paolo pointed out, this can't
really be avoided.

-- 

Thanks,

David / dhildenb