On 08.01.2018 18:36, Paolo Bonzini wrote: > On 08/01/2018 11:35, David Hildenbrand wrote: >> Thinking about it, I agree. It might be simpler/cleaner to transfer the >> "loaded" VMCS. But I think we should take care of only transferring data >> that actually is CPU state and not special to our current >> implementation. (e.g. nested_run_pending I would says is special to out >> current implementation, but we can discuss) >> >> So what I would consider VMX state: >> - vmxon >> - vmxon_ptr >> - vmptr >> - cached_vmcs12 >> - ... ? > > nested_run_pending is in the same boat as the various > KVM_GET_VCPU_EVENTS flags (e.g. nmi.injected vs. nmi.pending). It's not > "architectural" state, but it's part of the state machine so it has to > be serialized. I am wondering if we can get rid of it. In fact if we can go out of VMX mode every time we go to user space. As soon as we put it into the official VMX migration protocol, we have to support it. Now seems like the last time we can change it. I have the following things in mind (unfortunately I don't have time to look into the details) to make nested_run_pending completely internal state. 1. When going into user space, if we have nested_run_pending=true, set it to false and rewind the PSW to point again at the VMLAUNCH/VMRESUME function. The next VCPU run will simply continue executing the nested guest (trying to execute the VMLAUNCH/VMRESUME again). 2. When going into user space, if we have nested_run_pending=true, set it to false and fake another VMX exit that has no side effects (if possible). Something like the NULL intercept we have on s390x. But I have no idea how that plays e.g. with KVM_GET_VCPU_EVENTS (again, unfortunately no time to look into all the details). If we could get 1. running it would be cool, but I am not sure if it is feasible. If not, than we most likely will have to stick to it :( And as Paolo said, migrate it. Thanks > > Thanks, > > Paolo > -- Thanks, David / dhildenb