On Wed, 2021-06-23 at 15:21 +0200, Paolo Bonzini wrote: > On 23/06/21 15:01, Maxim Levitsky wrote: > > I did some homework on this now and I would like to share few my > > thoughts on this: > > > > First of all my attention caught the way we intercept the #SMI > > (this isn't 100% related to the bug but still worth talking about > > IMHO) > > > > A. Bare metal: Looks like SVM allows to intercept SMI, with > > SVM_EXIT_SMI, > > with an intention of then entering the BIOS SMM handler manually > > using the SMM_CTL msr. > > ... or just using STGI, which is what happens for KVM. This is in > the > manual: "The hypervisor may respond to the #VMEXIT(SMI) by executing > the > STGI instruction, which causes the pending SMI to be taken > immediately". Right, I didn't notice that, that makes sense. Thanks for the explanation! > > It *should* work for KVM to just not intercept SMI, but it adds more > complexity for no particular gain. It would be nice to do so to increase testing coverage of running a nested KVM. I'll add a hack for that in my nested kernel. > > > On bare metal we do set the INTERCEPT_SMI but we emulate the exit > > as a nop. > > I guess on bare metal there are some undocumented bits that BIOS > > set which > > make the CPU to ignore that SMI intercept and still take the #SMI > > handler, > > normally but I wonder if we could still break some motherboard > > code due to that. > > > > B. Nested: If #SMI is intercepted, then it causes nested VMEXIT. > > Since KVM does enable SMI intercept, when it runs nested it means > > that all SMIs > > that nested KVM gets are emulated as NOP, and L1's SMI handler is > > not run. > > No, this is incorrect. Note that svm_check_nested_events does not > clear > smi_pending the way vmx_check_nested_events does it for nmi_pending. > So > the interrupt is still there and will be injected on the next STGI. I din't check the code, but just assumed that same issue should be present. Now it makes sense. I totally forgot about STGI. Thanks, Best regards, Maxim Levitsky > > Paolo > > > > > About the issue that was fixed in this patch. Let me try to > > understand how > > it would work on bare metal: > > > > 1. A guest is entered. Host state is saved to VM_HSAVE_PA area (or > > stashed somewhere > > in the CPU) > > > > 2. #SMI (without intercept) happens > > > > 3. CPU has to exit SVM, and start running the host SMI handler, it > > loads the SMM > > state without touching the VM_HSAVE_PA runs the SMI handler, > > then once it RSMs, > > it restores the guest state from SMM area and continues the > > guest > > > > 4. Once a normal VMexit happens, the host state is restored from > > VM_HSAVE_PA > > > > So host state indeed can't be saved to VMC01. > > > > I to be honest think would prefer not to use the L1's hsave area > > but rather add back our > > 'hsave' in KVM and store there the L1 host state on the nested > > entry always. > > > > This way we will avoid touching the vmcb01 at all and both solve > > the issue and > > reduce code complexity. > > (copying of L1 host state to what basically is L1 guest state area > > and back > > even has a comment to explain why it (was) possible to do so. > > (before you discovered that this doesn't work with SMM). > > > > Thanks again for fixing this bug! > > > > Best regards, > > Maxim Levitsky > > >