[Bug 209155] KVM Linux guest with more than 1 CPU panics after commit 404d5d7bff0d419fe11c7eaebca9ec8f25258f95 on old CPU (Phenom x4)

bugzilla-daemon@xxxxxxxxxxxxxxxxxxx · Tue, 08 Sep 2020 17:08:14 +0000

https://bugzilla.kernel.org/show_bug.cgi?id=209155

Sean Christopherson (sean.j.christopherson@xxxxxxxxx) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |sean.j.christopherson@intel
                   |                            |.com

--- Comment #8 from Sean Christopherson (sean.j.christopherson@xxxxxxxxx) ---
>From code inspection, I'm 99% confident the immediate bug is that svm->next_rip
is reset in svm_vcpu_run() only after calling svm_exit_handlers_fastpath(),
which will cause SVM's skip_emulated_instruction() to write a stale RIP.  I
don't have AMD hardware to confirm, but this should be reproducible on modern
CPUs by loading kvm_amd with nrips=0.

That issue is easy enough to resolve, e.g. simply hoist "svm->next_rip = 0;" up
above the fastpath handling.  But, there are additional complications with
advancing rip in the fastpath as svm_complete_interrupts() consumes rip, e.g.
for NMI unmasking logic and event reinjection.  Odds are that NMI unmasking
will never "fail" as it would require the new rip to match the last IRET rip,
which would be very bizarre.  Similarly, event reinjection should also be a
non-issue in practice as the WRMSR fastpath shouldn't be reachable if KVM was
injecting an event.

All the being said, IMO, the safest play would be to first yank out the call to
handle_fastpath_set_msr_irqoff() in svm_exit_handlers_fastpath() to ensure a
clean base and to provide a safe backport patch, then move
svm_complete_interrupts() into svm_vcpu_run(), and finally move the call to
svm_exit_handlers_fastpath() down a ways and reenable
handle_fastpath_set_msr_irqoff().  Aside from resolving weirdness with rip and
fastpath, it would also align VMX and SVM with respect to completing
interrupts.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.