On 04/04/2016 22:46, Luiz Capitulino wrote: > When a vCPU runs on a nohz_full core, the hrtimer used by > the lapic emulation code can be migrated to another core. > When this happens, it's possible to observe milisecond > latency when delivering timer IRQs to KVM guests. > > The huge latency is mainly due to the fact that > apic_timer_fn() expects to run during a kvm exit. It > sets KVM_REQ_PENDING_TIMER and let it be handled on kvm > entry. However, if the timer fires on a different core, > we have to wait until the next kvm exit for the guest > to see KVM_REQ_PENDING_TIMER set. > > This problem became visible after commit 9642d18ee. This > commit changed the timer migration code to always attempt > to migrate timers away from nohz_full cores. While it's > discussable if this is correct/desirable (I don't think > it is), it's clear that the lapic emulation code has > a requirement on firing the hrtimer in the same core > where it was started. This is achieved by making the > hrtimer pinned. > > Lastly, note that KVM has code to migrate timers when a > vCPU is scheduled to run in different core. However, this > forced migration may fail. When this happens, we can have > the same problem. If we want 100% correctness, we'll have > to modify apic_timer_fn() to cause a kvm exit when it runs > on a different core than the vCPU. Not sure if this is > possible. > > Here's a reproducer for the issue being fixed: > > 1. Set all cores but core0 to be nohz_full cores > 2. Start a guest with a single vCPU > 3. Trace apic_timer_fn() and kvm_inject_apic_timer_irqs() > > You'll see that apic_timer_fn() will run in core0 while > kvm_inject_apic_timer_irqs() runs in a different core. If > you get both on core0, try running a program that takes 100% > of the CPU and pin it to core0 to force the vCPU out. > > Signed-off-by: Luiz Capitulino <lcapitulino@xxxxxxxxxx> > --- > arch/x86/kvm/lapic.c | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > index 443d2a5..1a2da0e 100644 > --- a/arch/x86/kvm/lapic.c > +++ b/arch/x86/kvm/lapic.c > @@ -1369,7 +1369,7 @@ static void start_apic_timer(struct kvm_lapic *apic) > > hrtimer_start(&apic->lapic_timer.timer, > ktime_add_ns(now, apic->lapic_timer.period), > - HRTIMER_MODE_ABS); > + HRTIMER_MODE_ABS_PINNED); > > apic_debug("%s: bus cycle is %" PRId64 "ns, now 0x%016" > PRIx64 ", " > @@ -1402,7 +1402,7 @@ static void start_apic_timer(struct kvm_lapic *apic) > expire = ktime_add_ns(now, ns); > expire = ktime_sub_ns(expire, lapic_timer_advance_ns); > hrtimer_start(&apic->lapic_timer.timer, > - expire, HRTIMER_MODE_ABS); > + expire, HRTIMER_MODE_ABS_PINNED); > } else > apic_timer_expired(apic); > > @@ -1868,7 +1868,7 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu) > apic->vcpu = vcpu; > > hrtimer_init(&apic->lapic_timer.timer, CLOCK_MONOTONIC, > - HRTIMER_MODE_ABS); > + HRTIMER_MODE_ABS_PINNED); > apic->lapic_timer.timer.function = apic_timer_fn; > > /* > @@ -2003,7 +2003,7 @@ void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu) > > timer = &vcpu->arch.apic->lapic_timer.timer; > if (hrtimer_cancel(timer)) > - hrtimer_start_expires(timer, HRTIMER_MODE_ABS); > + hrtimer_start_expires(timer, HRTIMER_MODE_ABS_PINNED); > } > > /* > Queued for 4.6.0-rc3, thanks. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html