On Mon, Jan 8, 2018 at 12:27 PM, David Hildenbrand <david@xxxxxxxxxx> wrote: > On 08.01.2018 21:19, Jim Mattson wrote: >> Even more trivially, what if the L2 VM is configured never to leave >> VMX non-root operation? Then we never exit to userspace? > > Well we would make the PC point at the VMLAUNCH. Then exit to userspace. That doesn't work, for so many reasons. 1. It's not sufficient to just rollback the instruction pointer. You also need to rollback CS, CR0, CR3 (and possibly the PDPTEs), and CR4, so that the virtual address of the instruction pointer will actually map to the same physical address as it did the first time. If the page tables have changed, or the L1 guest has overwritten the VMLAUNCH/VMRESUME instruction, then you're out of luck. 2. As you point out, interrupts are a problem. Interrupts can't be delivered in this context, because the vCPU shouldn't be in this context (and the guest may have already observed the transition to L2). 3. I'm assuming that you're planning to store most of the current L2 state in the cached VMCS12, at least where you can. Even so, the next "VM-entry" can't perform any of the normal VM-entry actions that would clobber the current L2 state that isn't in the cached VMCS12 (e.g. processing the VM-entry MSR load list). So, you need to have a flag indicating that this isn't a real VM-entry. That's no better than carrying the nested_run_pending flag. > When returning from userspace, try to execute VMLAUNCH again, leading to > an intercept and us going back into nested mode. > > A question would be, what happens to interrupts. They could be > delivered, but it would look like the guest was not executed. (as we are > pointing at the VMLAUNCH instruction). Not sure of this is a purely > theoretical problem. > > But the "assigned devices" thing is a real difference to s390x. > Assigning devices requires special SIE extensions we don't implement for > vSIE. And there is no such thing as MMIO. > > And if there is one case where we need it (assigned devices) - even > though we might find a way around it - it is very likely that there is more. > > -- > > Thanks, > > David / dhildenb