On 16/11/17 19:37, Radim Krčmář wrote:
2017-11-11 00:37+0200, Liran Alon:
On 10/11/17 23:37, Paolo Bonzini wrote:
On 10/11/2017 19:06, Radim Krčmář wrote:
/* the PIR and ON have been set by L1. */
if (!kvm_vcpu_trigger_posted_interrupt(vcpu, true)) {
This would still fail on the exiting case.
If one VCPU was just after a VM exit, then the sender would see it
IN_GUEST_MODE, send the posted notification and return true, but the
notification would do nothing
It would cause *something*---a vmexit because the vector doesn't match
the L1 posted interrupt. Then smp_kvm_posted_intr_nested_ipi would be
invoked from vmx_handle_external_intr.
Could we detect the vector in vmx_handle_external_intr and set
pi_pending+KVM_REQ_EVENT? Or invoke a function in KVM from
smp_kvm_posted_intr_nested_ipi? Or would both be insane?...
I have actually thought about it before writing this patch. But have found
an issue with this approach (which doesn't exist in this v1 patch and in
Radim's suggestion for v2):
Consider the case sender sees vcpu->mode==IN_GUEST_MODE and before it sends
the physical IPI, dest CPU exits from guest and continues in L0 all the way
until vcpu_enter_guest() and pass the part it checks for KVM_REQ_EVENT but
before it disables interrupts. Then sender sends the physical IPI which is
received in host-context and therefore runs smp_kvm_posted_intr_nested_ipi()
which sets KVM_REQ_EVENT & pi_pending=true. But without Radim's suggestion
of checking pi_pending after interrupts disabled, this is too late as dest
CPU will not check these again until next exit from L2 guest.
I hope I didn't misunderstand something here :)
kvm_request_pending() would notice KVM_REQ_EVENT and forces the VM entry
to restart, so that wouldn't be a problem.
I just realized another complication, though: when the sender looks at
IN_GUEST_MODE and before it sends IPI, the destination can reschedule to
a different VCPU => the IPI handler cannot use the 'current VCPU' and we
have to build a list of VCPUs potentially awaiting notification vector
for every PCPU, which makes it strictly worse than just looking directly
at the ON bit, IMO.
It's funny because I have actually tried to first implement Paolo's
suggestion and noticed this exact issue when implementing the IPI
handler at host-side.
Therefore, I decided to go with Radim's suggestion. This also makes
nested posted-interrupt handling more symmetrical to non-nested
posted-interrupt handling.
I will soon post here the v2 series after I will complete more internal
review & tests on this series.
BTW, I was wondering why do we really need the vmx->nested.pi_pending.
Can't we just rely on vmx->nested.pi_desc ON bit that will be checked at
start of vmx_complete_nested_posted_interrupt()?
Thanks.
-Liran