Re: Regression in nested SVM on 4.16 (bisected)

Liran Alon <liran.alon@xxxxxxxxxx> · Sat, 9 Jun 2018 06:50:18 -0700 (PDT)

----- marcan@xxxxxxxxx wrote:

> On 2018-06-09 07:48, Liran Alon wrote:
> > So I think the path to how to further bisect bug is very clear here:
> 
> > 1) First, attempt to change rsm_interception() to pass
> > EMULTYPE_NO_REEXECUTE and see if it makes a difference. (BTW, you
> can
> > submit a commit that adds this EMULTYPE_NO_REEXECUTE as it should
> be
> > present here) 2) If that doesn't work, attempt to remove
> > rsm_ins_bytes and instead pass NULL. If this works, this means that
> > there are cases which raise RSM interception on bytes different than
> 
> > "\x0f\xaa".
> 
> Neither of those help.

Yes I see why. I missed that commit 7607b7174405 ("KVM: SVM: install RSM intercept") also added
a line of "set_intercept(svm, INTERCEPT_RSM);" to init_vmcb().
Before this commit, RSM was not intercepted at all even though it had a handler in svm_exit_handlers[]
which is what confused me.

> 
> > Anyway, having a look at> # echo 1
> >/sys/kernel/debug/tracing/events/kvm/enable
> > # cat /sys/kernel/debug/tracing/trace_pipe
> > Should help debug the issue in case you discover this patch wasn't
> > the root-cause.
> 
> Tracing further, it seems the issue is that L0 is completely unaware
> of
> when L2 enters SMM mode. It doesn't know about the right SMBASE, it
> doesn't know about whether L2 is in SMM or not. The emulation delivers
> a
> #UD and then L2 triggers a shutdown:
> 
> d..1 27652.855246: kvm_entry: vcpu 4
> .... 27652.855248: kvm_exit: reason rsm rip 0xfd399 info 0 0
> .... 27652.855248: kvm_nested_vmexit: rip: 0x00000000000fd399 reason:
> rsm ext_inf1: 0x0000000000000000 ext_inf2: 0x0000000000000000
> ext_int:
> 0x00000000 ext_int_err: 0x00000000
> .... 27652.855254: kvm_emulate_insn: 0:fd399:0f aa (prot32)
> .... 27652.855257: kvm_inj_exception: #UD (0x0)
> d..1 27652.855258: kvm_entry: vcpu 4
> .... 27652.855259: kvm_exit: reason shutdown rip 0xfd399 info 0 0
> .... 27652.855259: kvm_nested_vmexit: rip: 0x00000000000fd399 reason:
> shutdown ext_inf1: 0x0000000000000000 ext_inf2: 0x0000000000000000
> ext_int: 0x80000b08 ext_int_err: 0x00000000
> .... 27652.855259: kvm_nested_vmexit_inject: reason: shutdown
> ext_inf1:
> 0x0000000000000000 ext_inf2: 0x0000000000000000 ext_int: 0x80000b08
> ext_int_err: 0x00000000
> d..1 27652.855260: kvm_entry: vcpu 4
> 
> L0 has no idea that L2 is in SMM (when L1 boots I do see SMM
> entries/exits and correctly emulated rsm intercepts).
> 
> But without the bad commit I get this:
> 
> .... 12724.894359: kvm_exit: reason UD excp rip 0xfd399 info 0 0
> .... 12724.894359: kvm_nested_vmexit: rip: 0x00000000000fd399 reason:
> UD
> excp ext_inf1: 0x0000000000000000 ext_inf2: 0x0000000000000000
> ext_int:
> 0x00000000 ext_int_err: 0x00000000
> .... 12724.894359: kvm_nested_vmexit_inject: reason: UD excp
> ext_inf1:
> 0x0000000000000000 ext_inf2: 0x0000000000000000 ext_int: 0x00000000
> ext_int_err: 0x00000000
> 
> So it's still a #UD. But it seems the problem here is that when the
> RSM
> handler triggers a #UD because it thinks the guest isn't in SMM mode
> (which is fine if that were delivered to L1, since L1 knows how to
> handle it), it gets delivered straight to L2. Without the RSM
> intercept,
> the #UD triggers a nested vmexit, and things work.

So we can summarize that the root-cause of the bug is indeed that the #UD raised from the RSM emulation is injected directly to L2 instead of raising VMExit to L1 on #UD as it intercepts it.

Before the bad commit, RSM wasn't intercepted by L0 and therefore when it was executed by L2 a #UD was
raised and intercepted by L0. When such #UD was raised, handle_exit() -> nested_svm_exit_handled() -> 
nested_svm_intercept() was executed which returned NESTED_EXIT_DONE which caused nested_svm_exit_handled()
to call nested_svm_vmexit() which injected #UD to L1. Then L1 #UD VMExit handler was able to emulate
RSM as required.

However, after the bad commit, L0 intercepted RSM and nested_svm_intercept() have returned NESTED_EXIT_HOST because we can see in your trace that L0 didn't have kvm_nested_vmexit_inject for RSM.
This makes sense if L1 don't contain this commit and therefore doesn't intercept RSM.
In that case, L0 emulates RSM and because it sees L2 isn't in SMM-mode, it indeed queue a #UD to be raised.

For nVMX, you are correct that this #UD would then have been detected by vmx_check_nested_events() which would have called nested_vmx_inject_exception_vmexit().
For nSVM, inject_pending_event() will call svm_queue_exception() which will call nested_svm_check_exception(). Then, nested_svm_intercept() will return NESTED_EXIT_DONE which will set svm->nested.exit_required to true.
Then, when svm_vcpu_run() will be run to enter guest it will see that svm->nested.exit_required is set to true and therefore will just return without entering L2. Then, svm.c handle_exit() will be called which will see that svm->nested.exit_required is true and therefore call nested_svm_vmexit() which should inject the #UD to L1 as required.

>From the above, note that kvm_entry trace is a bit misleading as it is printed at vcpu_enter_guest() before calling kvm_x86_ops->run() but actually in this case, svm_vcpu_run() haven't really entered L2 guest...

Therefore, I'm still not sure that the #UD was indeed injected directly to L2.
I recommend following the code path I just presented with more prints to debug the issue.

Hope it helps,
-Liran