Consider the following scenario: L1 has never successfully executed VMLAUNCH. It has written 0 to vmcs12's host CR3 field using VMWRITE, but the current host CR3 value is actually 3e7000. It has written some illegal control field that the L0 KVM doesn't check itself, but defers to the hardware checks on vmcs02 instead. So, when L1 tries to execute VMLAUNCH, L0 follows this path for "VM-entry to vmcs02 failed due to invalid control field(s)." Your change would set CR3 to 0, which is incorrect. CR3 should actually be set to 3e7000. Now, if L0 is sane and using EPT, then it can find the correct L1 CR3 value in vmcs01's Guest CR3 field, but if for some reason L0 is using shadow paging to execute L1, that won't work. Similarly, the correct L1 CR4 value should be in vmcs01's CR4 read shadow field. You can't just assume that L1 has written values to the vmcs12 host fields that actually match the current host values. There is nothing in the architecture that would require this behavior. On Wed, Feb 7, 2018 at 10:22 PM, Wanpeng Li <kernellwp@xxxxxxxxx> wrote: > 2018-02-08 0:57 GMT+08:00 Jim Mattson <jmattson@xxxxxxxxxx>: >> vmcs12->host_cr[34] does not contain the up-to-date values when L1 is >> running. L1 can vmwrite any values there. We know at this point that > > It will incur a vmexit to emulate L1 vmwrites vmcs12->host_cr[34] even > if vmcs shadow is enabled since host_cr[34] is not shadowed in the > bitmap, why it is not up-to-date when L1 is running? > > Regards, > Wanpeng Li > >> they are legal (because we checked them), but that's about it. If the >> VMLAUNCH/VMRESUME of vmcs12 fails for "invalid control field," there >> is no VM-exit from L2 to L1, and these fields are not loaded. Instead, >> execution just falls through to the next instruction with VMFailValid >> semantics. >> >> On Wed, Feb 7, 2018 at 12:31 AM, Wanpeng Li <kernellwp@xxxxxxxxx> wrote: >>> 2018-02-07 0:58 GMT+08:00 Jim Mattson <jmattson@xxxxxxxxxx>: >>>> On Mon, Feb 5, 2018 at 4:57 PM, Wanpeng Li <kernellwp@xxxxxxxxx> wrote: >>>> >>>>> This is effective one, what I restore in this patch is >>>>> achitectural/guest visible. >>>> >>>> This patch doesn't "restore" the guest visible CR4 to its value at the >>>> time of VMLAUNCH/VMRESUME. It loads a new CR4 value from the vmcs12. >>>> That behavior is incorrect. >>> >>> You have another pointing out about this. >>> https://lkml.org/lkml/2018/2/5/518 vmcs12->host_cr3/host_cr4 has the >>> up-to-date value when L1 is running, it is still up-to-date after >>> vmexit due to L1 executes VMLAUNCH/VMRESUME, I think the value stays >>> the same before L0 emulates the VMLAUNCH/VMRESUME, according to below >>> comments, why vmcs12->host_cr3/cr4 is not the value which we should >>> restore? >>> >>> * After an early L2 VM-entry failure, we're now back >>> * in L1 which thinks it just finished a VMLAUNCH or >>> * VMRESUME instruction >>> >>> Regards, >>> Wanpeng Li