On Wed, 2022-08-10 at 14:00 +0200, Thomas Lamprecht wrote: > On 03/08/2022 17:49, Maxim Levitsky wrote: > > This patch series is a result of long debug work to find out why > > sometimes guests with win11 secure boot > > were failing during boot. > > > > During writing a unit test I found another bug, turns out > > that on rsm emulation, if the rsm instruction was done in real > > or 32 bit mode, KVM would truncate the restored RIP to 32 bit. > > > > I also refactored the way we write SMRAM so it is easier > > now to understand what is going on. > > > > The main bug in this series which I fixed is that we > > allowed #SMI to happen during the STI interrupt shadow, > > and we did nothing to both reset it on #SMI handler > > entry and restore it on RSM. > > > > V3: addressed most of the review feedback from Sean (thanks!) > > > > Best regards, > > Maxim Levitsky > > > > Maxim Levitsky (13): > > bug: introduce ASSERT_STRUCT_OFFSET > > KVM: x86: emulator: em_sysexit should update ctxt->mode > > KVM: x86: emulator: introduce emulator_recalc_and_set_mode > > KVM: x86: emulator: update the emulation mode after rsm > > KVM: x86: emulator: update the emulation mode after CR0 write > > KVM: x86: emulator/smm: number of GPRs in the SMRAM image depends on > > the image format > > KVM: x86: emulator/smm: add structs for KVM's smram layout > > KVM: x86: emulator/smm: use smram structs in the common code > > KVM: x86: emulator/smm: use smram struct for 32 bit smram load/restore > > KVM: x86: emulator/smm: use smram struct for 64 bit smram load/restore > > KVM: x86: SVM: use smram structs > > KVM: x86: SVM: don't save SVM state to SMRAM when VM is not long mode > > capable > > KVM: x86: emulator/smm: preserve interrupt shadow in SMRAM > > > > arch/x86/include/asm/kvm_host.h | 11 +- > > arch/x86/kvm/emulate.c | 305 +++++++++++++++++--------------- > > arch/x86/kvm/kvm_emulate.h | 223 ++++++++++++++++++++++- > > arch/x86/kvm/svm/svm.c | 30 ++-- > > arch/x86/kvm/vmx/vmcs12.h | 5 +- > > arch/x86/kvm/vmx/vmx.c | 4 +- > > arch/x86/kvm/x86.c | 175 +++++++++--------- > > include/linux/build_bug.h | 9 + > > 8 files changed, 497 insertions(+), 265 deletions(-) > > > > FWIW, we tested the v2 on 5.19 and backported it to 5.15 with minimal adaption > required (mostly unrelated context change) and now also updated that backport > to the v3 of this patch series. > > Our reproducer got fixed with either, but v3 now also avoids triggering logs like: > > Jul 29 04:59:18 mits4 QEMU[2775]: kvm: Could not update PFLASH: Stale file handle > Jul 29 04:59:18 mits4 QEMU[2775]: kvm: Could not update PFLASH: Stale file handle > Jul 29 07:15:46 mits4 kernel: kvm: vcpu 1: requested 191999 ns lapic timer period limited to 200000 ns > Jul 29 11:06:31 mits4 kernel: kvm: vcpu 1: requested 105786 ns lapic timer period limited to 200000 ns > > which happened earlier (not sure how deep that correlates with the v2 vs. v3, but > it stuck out, so mentioning for sake of completeness). This is likely just a coincidence because V3 should not contain any functional changes vs v2. (If I remember correctly) > > For the backport to 5.15 we skipped "KVM: x86: emulator/smm: number of GPRs in > the SMRAM image depends on the image format", as that constant was there yet and > the actual values stayed the same for our case FWICT and adapted to slight context > changes for the others. > > So, the approach seems to fix our issue and we are already rolling out a kernel > to users for testing and got positive feedback there too. > > With above in mind: > > Tested-by: Thomas Lamprecht <t.lamprecht@xxxxxxxxxxx> Thank you very much for testing! > > It would be also great to see this backported to still supported upstream stable kernels > from 5.15 onwards, as there the TDP MMU got by default enabled, and that is at least > increasing the chance of our reproducer to trigger dramatically. Best regards, Maxim Levitsky > > thx & cheers > Thomas >