On Mon, Jan 8, 2018 at 1:46 PM, David Hildenbrand <david@xxxxxxxxxx> wrote: >> The vmcs01 does serve as a cache of L1 state at the time of VM-entry, >> so if we simply restored the vmcs01 state, that would take care of >> most of the rollback issues, as long as we don't deliver any >> interrupts in this context. However, I would like to see the >> vmcs01/vmcs02 separation go away at some point. svm.c seems to do fine >> with just once VMCB. > > Interesting point, might make things easier for VMX. > >> >>> If the page >>>> tables have changed, or the L1 guest has overwritten the >>>> VMLAUNCH/VMRESUME instruction, then you're out of luck. >>> >>> Page tables getting changed by other CPUs is actually a good point. But >>> I would consider both as "theoretical" problems. At least compared to >>> the interrupt stuff, which could also happen on guests behaving in a >>> more sane way. >> >> My preference is for solutions that are architecturally correct, >> thereby solving the theoretical problems as well as the empirical >> ones. However, I grant that the Linux community leans the other way in >> general. > > I usually agree, unless it makes the code horribly complicated without > any real benefit. (e.g. for corner cases like this one: a CPU modifying > instruction text of another CPU which is currently executing them) I don't feel that one additional bit of serialized state is horribly complicated. It's aesthetically unpleasant, to be sure, but not horribly complicated. And it's considerably less complicated than your proposal. :-) >>>> 3. I'm assuming that you're planning to store most of the current L2 >>>> state in the cached VMCS12, at least where you can. Even so, the next >>>> "VM-entry" can't perform any of the normal VM-entry actions that would >>>> clobber the current L2 state that isn't in the cached VMCS12 (e.g. >>>> processing the VM-entry MSR load list). So, you need to have a flag >>>> indicating that this isn't a real VM-entry. That's no better than >>>> carrying the nested_run_pending flag. >>> >>> Not sure if that would really be necessary (would have to look into the >>> details first). But sounds like nested_run_pending seems unavoidable on >>> x86. So I'd better get used to QEMU dealing with nested CPU state (which >>> is somehow scary to me - an emulator getting involved in nested >>> execution - what could go wrong :) ) >> >> For save/restore, QEMU doesn't actually have to know what the flag >> means. It just has to pass it on. (Our userspace agent doesn't know >> anything about the meaning of the flags in the kvm_vmx_state >> structure.) > > I agree on save/restore. My comment was rather about involving a L0 > emulator when running a L2 guest. But as Paolo pointed out, this can't > really be avoided. > > -- > > Thanks, > > David / dhildenb