On 8/22/23 10:14, Sean Christopherson wrote:
On Tue, Aug 22, 2023, Tom Lendacky wrote:
On 8/10/23 18:49, Sean Christopherson wrote:
Fix a bug where KVM injects a bogus #UD for SEV guests when trying to skip
an INT3 as part of re-injecting the associated #BP that got kinda sorta
intercepted due to a #NPF occuring while vectoring/delivering the #BP.
I haven't actually confirmed that patch 1 fixes the bug, as it's a
different change than what I originally proposed. I'm 99% certain it will
work, but I definitely need verification that it fixes the problem
Patch 2 is a tangentially related cleanup to make NRIPS a requirement for
enabling SEV, e.g. so that we don't ever get "bug" reports of SEV guests
not working when NRIPS is disabled.
Sean Christopherson (2):
KVM: SVM: Don't inject #UD if KVM attempts emulation of SEV guest w/o
insn
KVM: SVM: Require nrips support for SEV guests (and beyond)
arch/x86/kvm/svm/sev.c | 2 +-
arch/x86/kvm/svm/svm.c | 37 ++++++++++++++++++++-----------------
arch/x86/kvm/svm/svm.h | 1 +
3 files changed, 22 insertions(+), 18 deletions(-)
We ran some stress tests against a version of the kernel without this fix
and we're able to reproduce the issue, but not reliably, after a few hours.
With this patch, it has not reproduced after running for a week.
Not as reliable a scenario as the original reporter, but this looks like it
resolves the issue.
Thanks Tom! I'll apply this for v6.6, that'll give us plenty of time to change
course if necessary.
I may have spoke to soon... When the #UD was triggered it was here:
[ 0.118524] Spectre V2 : Enabling Restricted Speculation for firmware calls
[ 0.118524] Spectre V2 : mitigation: Enabling conditional Indirect Branch Prediction Barrier
[ 0.118524] Speculative Store Bypass: Mitigation: Speculative Store Bypass disabled via prctl
[ 0.118524] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ 0.118524] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.2.2-amdsos-build50-ubuntu-20.04+ #1
[ 0.118524] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
[ 0.118524] RIP: 0010:int3_selftest_ip+0x0/0x60
[ 0.118524] Code: b9 25 05 00 00 48 c7 c2 e8 7c 80 b0 48 c7 c6 fe 1c d3 b0 48 c7 c7 f0 7d da b0 e8 4c 2c 0b ff e8 75 da 15 ff 0f 0b 48 8d 7d f4 <cc> 90 90 90 90 83 7d f4 01 74 2f 80 3d 39 7f a8 00 00 74 24 b9 34
Now (after about a week) we've encountered a hang here:
[ 0.106216] Spectre V2 : Enabling Restricted Speculation for firmware calls
[ 0.106216] Spectre V2 : mitigation: Enabling conditional Indirect Branch Prediction Barrier
[ 0.106216] Speculative Store Bypass: Mitigation: Speculative Store Bypass disabled via prctl
It is in the very same spot and so I wonder if the return false (without
queuing a #UD) is causing an infinite loop here that appears as a guest
hang. Whereas, we have some systems running the first patch that you
created that have not hit this hang.
But I'm not sure why or how this patch could cause the guest hang. I
would think that the retry of the instruction would resolve everything
and the guest would continue. Unfortunately, the guest was killed, so I'll
try to reproduce and get a dump or trace points of the VM to see what is
going on.
Thanks,
Tom