Re: [PATCH v2 11/11] KVM: x86: emulator/smm: preserve interrupt shadow in SMRAM

Maxim Levitsky <mlevitsk@xxxxxxxxxx> · Tue, 05 Jul 2022 16:38:42 +0300

On Thu, 2022-06-30 at 09:00 -0700, Jim Mattson wrote:
> On Wed, Jun 29, 2022 at 11:00 PM Maxim Levitsky <mlevitsk@xxxxxxxxxx> wrote:
> > 
> > On Wed, 2022-06-29 at 09:31 -0700, Jim Mattson wrote:
> > > On Tue, Jun 21, 2022 at 8:09 AM Maxim Levitsky <mlevitsk@xxxxxxxxxx> wrote:
> > > > When #SMI is asserted, the CPU can be in interrupt shadow
> > > > due to sti or mov ss.
> > > > 
> > > > It is not mandatory in  Intel/AMD prm to have the #SMI
> > > > blocked during the shadow, and on top of
> > > > that, since neither SVM nor VMX has true support for SMI
> > > > window, waiting for one instruction would mean single stepping
> > > > the guest.
> > > > 
> > > > Instead, allow #SMI in this case, but both reset the interrupt
> > > > window and stash its value in SMRAM to restore it on exit
> > > > from SMM.
> > > > 
> > > > This fixes rare failures seen mostly on windows guests on VMX,
> > > > when #SMI falls on the sti instruction which mainfest in
> > > > VM entry failure due to EFLAGS.IF not being set, but STI interrupt
> > > > window still being set in the VMCS.
> > > 
> > > I think you're just making stuff up! See Note #5 at
> > > https://sandpile.org/x86/inter.htm.
> > > 
> > > Can you reference the vendors' documentation that supports this change?
> > > 
> > 
> > First of all, just to note that the actual issue here was that
> > we don't clear the shadow bits in the guest interruptability field
> > in the vmcb on SMM entry, that triggered a consistency check because
> > we do clear EFLAGS.IF.
> > Preserving the interrupt shadow is just nice to have.
> > 
> > 
> > That what Intel's spec says for the 'STI':
> > 
> > "The IF flag and the STI and CLI instructions do not prohibit the generation of exceptions and nonmaskable inter-
> > rupts (NMIs). However, NMIs (and system-management interrupts) may be inhibited on the instruction boundary
> > following an execution of STI that begins with IF = 0."
> > 
> > Thus it is likely that #SMI are just blocked when in shadow, but it is easier to implement
> > it this way (avoids single stepping the guest) and without any user visable difference,
> > which I noted in the patch description, I noted that there are two ways to solve this,
> > and preserving the int shadow in SMRAM is just more simple way.
> 
> It's not true that there is no user-visible difference. In your
> implementation, the SMI handler can see that the interrupt was
> delivered in the interrupt shadow.

Most of the SMI save state area is reserved, and the handler has no way of knowing
what CPU stored there, it can only access the fields that are reserved in the spec.

Yes, if the SMI handler really insists it can see that the saved RIP points to an
instruction that follows the STI, but does that really matter? It is allowed by the
spec explicitly anyway.

Plus our SMI layout (at least for 32 bit) doesn't confirm to the X86 spec anyway,
we as I found out flat out write over the fields that have other meaning in the X86 spec.

Also I proposed to preserve the int shadow in internal kvm state and migrate
it in upper 4 bits of the 'shadow' field of struct kvm_vcpu_events.
Both Paolo and Sean proposed to store the int shadow in the SMRAM instead,
and you didn't object to this, and now after I refactored and implemented
the whole thing you suddently do.

BTW, just FYI, I found out that qemu doesn't migrate the 'shadow' field,
this needs to be fixed (not related to the issue, just FYI).

> 
> The right fix for this problem is to block SMI in an interrupt shadow,
> as is likely the case for all modern CPUs.

Yes, I agree that this is the most correct fix. 

However AMD just recently posted a VNMI patch series to avoid
single stepping the CPU when NMI is blocked due to the same reason, because
it is fragile.

Do you really want KVM to single step the guest in this case, to deliver the #SMI?
I can do it, but it is bound to cause lot of trouble.

Note that I will have to do it on both Intel and AMD, as neither has support for SMI
window, unless I were to use MTF, which is broken on nested virt as you know,
so a nested hypervisor running a guest with SMI will now have to cope with broken MTF.

Note that I can't use the VIRQ hack we use for interrupt window, because there
is no guarantee that the guest's EFLAGS.IF is on.

Best regards,	
	Maxim Levitsky

> 
> > 
> > As for CPUS that neither block SMI nor preserve the int shadaw, in theory they can, but that would
> > break things, as noted in this mail
> > 
> > https://lore.kernel.org/lkml/1284913699-14986-1-git-send-email-avi@xxxxxxxxxx/
> > 
> > It is possible though that real cpu supports HLT restart flag, which makes this a non issue,
> > still. I can't rule out that a real cpu doesn't preserve the interrupt shadow on SMI, but
> > I don't see why we can't do this to make things more robust.
> 
> Because, as I said, I think you're just making stuff up...unless, of
> course, you have documentation to back this up.
>