On 8/23/23 15:03, Sean Christopherson wrote:
On Wed, Aug 23, 2023, Tom Lendacky wrote:
On 8/22/23 10:14, Sean Christopherson wrote:
On Tue, Aug 22, 2023, Tom Lendacky wrote:
On 8/10/23 18:49, Sean Christopherson wrote:
Fix a bug where KVM injects a bogus #UD for SEV guests when trying to skip
an INT3 as part of re-injecting the associated #BP that got kinda sorta
intercepted due to a #NPF occuring while vectoring/delivering the #BP.
I haven't actually confirmed that patch 1 fixes the bug, as it's a
different change than what I originally proposed. I'm 99% certain it will
work, but I definitely need verification that it fixes the problem
Patch 2 is a tangentially related cleanup to make NRIPS a requirement for
enabling SEV, e.g. so that we don't ever get "bug" reports of SEV guests
not working when NRIPS is disabled.
Sean Christopherson (2):
KVM: SVM: Don't inject #UD if KVM attempts emulation of SEV guest w/o
insn
KVM: SVM: Require nrips support for SEV guests (and beyond)
arch/x86/kvm/svm/sev.c | 2 +-
arch/x86/kvm/svm/svm.c | 37 ++++++++++++++++++++-----------------
arch/x86/kvm/svm/svm.h | 1 +
3 files changed, 22 insertions(+), 18 deletions(-)
We ran some stress tests against a version of the kernel without this fix
and we're able to reproduce the issue, but not reliably, after a few hours.
With this patch, it has not reproduced after running for a week.
Not as reliable a scenario as the original reporter, but this looks like it
resolves the issue.
Thanks Tom! I'll apply this for v6.6, that'll give us plenty of time to change
course if necessary.
I may have spoke to soon... When the #UD was triggered it was here:
[ 0.118524] Spectre V2 : Enabling Restricted Speculation for firmware calls
[ 0.118524] Spectre V2 : mitigation: Enabling conditional Indirect Branch Prediction Barrier
[ 0.118524] Speculative Store Bypass: Mitigation: Speculative Store Bypass disabled via prctl
[ 0.118524] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ 0.118524] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.2.2-amdsos-build50-ubuntu-20.04+ #1
[ 0.118524] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
[ 0.118524] RIP: 0010:int3_selftest_ip+0x0/0x60
[ 0.118524] Code: b9 25 05 00 00 48 c7 c2 e8 7c 80 b0 48 c7 c6 fe 1c d3 b0 48 c7 c7 f0 7d da b0 e8 4c 2c 0b ff e8 75 da 15 ff 0f 0b 48 8d 7d f4 <cc> 90 90 90 90 83 7d f4 01 74 2f 80 3d 39 7f a8 00 00 74 24 b9 34
Now (after about a week) we've encountered a hang here:
[ 0.106216] Spectre V2 : Enabling Restricted Speculation for firmware calls
[ 0.106216] Spectre V2 : mitigation: Enabling conditional Indirect Branch Prediction Barrier
[ 0.106216] Speculative Store Bypass: Mitigation: Speculative Store Bypass disabled via prctl
It is in the very same spot and so I wonder if the return false (without
queuing a #UD) is causing an infinite loop here that appears as a guest
hang. Whereas, we have some systems running the first patch that you
created that have not hit this hang.
But I'm not sure why or how this patch could cause the guest hang. I
would think that the retry of the instruction would resolve everything
and the guest would continue. Unfortunately, the guest was killed, so I'll
try to reproduce and get a dump or trace points of the VM to see what is
going on.
Gah, it's because x86_emulate_instruction() returns '1' and not '0' when
svm_can_emulate_instruction() returns false. svm_update_soft_interrupt_rip()
would then continue with the injection, i.e. inject #BP on the INT3 RIP, not on
the RIP following the INT3, which would cause this check to fail
if (regs->ip - INT3_INSN_SIZE != selftest)
return NOTIFY_DONE;
and eventually send do_trap_no_signal() to die().
Thanks for figuring that out so quickly!
I distinctly remember seeing the return value problem when writing the patch, but
missed that it would result in KVM injecting the unexpected #BP.
I punted on trying to properly fix this by having can_emulate_instruction()
differentiate between "retry insn" and "inject exception", because that change
is painfully invasive and I though I could get away with the simple fix. Drat.
I think the best option is to add a "temporary" patch so that the fix for @stable
is short, sweet, and safe, and then do the can_emulate_instruction() cleanup that
I was avoiding.
E.g. this as patch 2/4 (or maybe 2/5) of this series:
2/4 or 2/5? Do you mean 2/3 since there were only 2 patches in the series?
I'll apply the below patch in between patches 1 and 2 and re-test. Should
have results in a week :)
Thanks,
Tom
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 7cb5ef5835c2..8457a36b44c1 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -364,6 +364,8 @@ static void svm_set_interrupt_shadow(struct kvm_vcpu *vcpu, int mask)
svm->vmcb->control.int_state |= SVM_INTERRUPT_SHADOW_MASK;
}
+static bool svm_can_emulate_instruction(struct kvm_vcpu *vcpu, int emul_type,
+ void *insn, int insn_len);
static int __svm_skip_emulated_instruction(struct kvm_vcpu *vcpu,
bool commit_side_effects)
@@ -384,6 +386,14 @@ static int __svm_skip_emulated_instruction(struct kvm_vcpu *vcpu,
}
if (!svm->next_rip) {
+ /*
+ * FIXME: Drop this when kvm_emulate_instruction() does the
+ * right thing and treats "can't emulate" as outright failure
+ * for EMULTYPE_SKIP.
+ */
+ if (!svm_can_emulate_instruction(vcpu, EMULTYPE_SKIP, NULL, 0))
+ return 0;
+
if (unlikely(!commit_side_effects))
old_rflags = svm->vmcb->save.rflags;