On 6.04.2022 03:48, Sean Christopherson wrote:
On Mon, Apr 04, 2022, Maciej S. Szmigiero wrote:
(..)
Also, I'm not sure that even the proposed updated code above will
actually restore the L1-requested next_rip correctly on L1 -> L2
re-injection (will review once the full version is available).
Spoiler alert, it doesn't. Save yourself the review time. :-)
The missing piece is stashing away the injected event on nested VMRUN. Those
events don't get routed through the normal interrupt/exception injection code and
so the next_rip info is lost on the subsequent #NPF.
Treating soft interrupts/exceptions like they were injected by KVM (which they
are, technically) works and doesn't seem too gross. E.g. when prepping vmcb02
if (svm->nrips_enabled)
vmcb02->control.next_rip = svm->nested.ctl.next_rip;
else if (boot_cpu_has(X86_FEATURE_NRIPS))
vmcb02->control.next_rip = vmcb12_rip;
if (is_evtinj_soft(vmcb02->control.event_inj)) {
svm->soft_int_injected = true;
svm->soft_int_csbase = svm->vmcb->save.cs.base;
svm->soft_int_old_rip = vmcb12_rip;
if (svm->nrips_enabled)
svm->soft_int_next_rip = svm->nested.ctl.next_rip;
else
svm->soft_int_next_rip = vmcb12_rip;
}
And then the VMRUN error path just needs to clear soft_int_injected.
I am also a fan of parsing EVENTINJ from VMCB12 into relevant KVM
injection structures (much like EXITINTINFO is parsed), as I said to
Maxim two days ago [1].
Not only for software {interrupts,exceptions} but for all incoming
events (again, just like EXITINTINFO).
However, there is another issue related to L1 -> L2 event re-injection
using standard KVM event injection mechanism: it mixes the L1 injection
state with the L2 one.
Specifically for SVM:
* When re-injecting a NMI into L2 NMI-blocking is enabled in
vcpu->arch.hflags (shared between L1 and L2) and IRET intercept is
enabled.
This is incorrect, since it is L1 that is responsible for enforcing NMI
blocking for NMIs that it injects into its L2.
Also, *L2* being the target of such injection definitely should not block
further NMIs for *L1*.
* When re-injecting a *hardware* IRQ into L2 GIF is checked (previously
even on the BUG_ON() level), while L1 should be able to inject even when
L2 GIF is off,
With the code in my previous patch set I planned to use
exit_during_event_injection() to detect such case, but if we implement
VMCB12 EVENTINJ parsing we can simply add a flag that the relevant event
comes from L1, so its normal injection side-effects should be skipped.
By the way, the relevant VMX code also looks rather suspicious,
especially for the !enable_vnmi case.
Thanks,
Maciej
[1]: https://lore.kernel.org/kvm/7d67bc6f-00ac-7c07-f6c2-c41b2f0d35a1@xxxxxxxxxxxxxxxxxxxxx/