On Wed, 2021-06-23 at 16:01 +0300, Maxim Levitsky wrote: > On Wed, 2021-06-23 at 11:39 +0200, Paolo Bonzini wrote: > > On 23/06/21 09:44, Vitaly Kuznetsov wrote: > > > - RFC: I'm not 100% sure my 'smart' idea to use currently-unused HSAVE area > > > is that smart. Also, we don't even seem to check that L1 set it up upon > > > nested VMRUN so hypervisors which don't do that may remain broken. A very > > > much needed selftest is also missing. > > > > It's certainly a bit weird, but I guess it counts as smart too. It > > needs a few more comments, but I think it's a good solution. > > > > One could delay the backwards memcpy until vmexit time, but that would > > require a new flag so it's not worth it for what is a pretty rare and > > already expensive case. > > > > Paolo > > > > Hi! > > I did some homework on this now and I would like to share few my thoughts on this: > > First of all my attention caught the way we intercept the #SMI > (this isn't 100% related to the bug but still worth talking about IMHO) > > A. Bare metal: Looks like SVM allows to intercept SMI, with SVM_EXIT_SMI, > with an intention of then entering the BIOS SMM handler manually using the SMM_CTL msr. > On bare metal we do set the INTERCEPT_SMI but we emulate the exit as a nop. > I guess on bare metal there are some undocumented bits that BIOS set which > make the CPU to ignore that SMI intercept and still take the #SMI handler, > normally but I wonder if we could still break some motherboard > code due to that. > > > B. Nested: If #SMI is intercepted, then it causes nested VMEXIT. > Since KVM does enable SMI intercept, when it runs nested it means that all SMIs > that nested KVM gets are emulated as NOP, and L1's SMI handler is not run. > > > About the issue that was fixed in this patch. Let me try to understand how > it would work on bare metal: > > 1. A guest is entered. Host state is saved to VM_HSAVE_PA area (or stashed somewhere > in the CPU) > > 2. #SMI (without intercept) happens > > 3. CPU has to exit SVM, and start running the host SMI handler, it loads the SMM > state without touching the VM_HSAVE_PA runs the SMI handler, then once it RSMs, > it restores the guest state from SMM area and continues the guest > > 4. Once a normal VMexit happens, the host state is restored from VM_HSAVE_PA > > So host state indeed can't be saved to VMC01. > > I to be honest think would prefer not to use the L1's hsave area but rather add back our > 'hsave' in KVM and store there the L1 host state on the nested entry always. > > This way we will avoid touching the vmcb01 at all and both solve the issue and > reduce code complexity. > (copying of L1 host state to what basically is L1 guest state area and back > even has a comment to explain why it (was) possible to do so. > (before you discovered that this doesn't work with SMM). I need more coffee today. The comment is somwhat wrong actually. When L1 switches to L2, then its HSAVE area is L1 guest state, but but L1 is a "host" vs L2, so it is host state. The copying is more between kvm's register cache and the vmcb. So maybe backing it up as this patch does is the best solution yet. I will take more in depth look at this soon. Best regards, Maxim Levitsky > > Thanks again for fixing this bug! > > Best regards, > Maxim Levitsky