> On 12 Apr 2019, at 23:18, Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote: > > To minimize the latency of timer interrupts as observed by the guest, > KVM adjusts the values it programs into the host timers to account for > the host's overhead of programming and handling the timer event. Now > that the timer advancement is automatically tuned during runtime, it's > effectively unbounded by default, e.g. if KVM is running as L1 the > advancement can measure in hundreds of milliseconds. > > Place a somewhat arbitrary hard cap of 5000ns on the auto-calculated > advancement, as large advancements can break reasonable assumptions of > the guest, e.g. that a timer configured to fire after 1ms won't arrive > on the next instruction. Although KVM busy waits to mitigate the timer > event arriving too early, complications can arise when shifting the > interrupt too far, e.g. vmx.flat/interrupt in kvm-unit-tests will fail > when its "host" exits on interrupts (because the INTR is injected before > the gets executes STI+HLT). Arguably the unit test is "broken" in the > sense that delaying the timer interrupt by 1ms doesn't technically > guarantee the interrupt will arrive after STI+HLT, but it's a reasonable > assumption that KVM should support. > > Furthermore, an unbounded advancement also effectively unbounds the time > spent busy waiting, e.g. if the guest programs a timer with a very large > delay. > > Arguably the advancement logic could simply be disabled when running as > L1, but KVM needs to bound the advancement time regardless, e.g. if the > TSC is unstable and the calculations get wildly out of whack. And > allowing the advancement when running as L1 is a good stress test of > sorts for the logic. > > Cc: Liran Alon <liran.alon@xxxxxxxxxx> > Cc: Wanpeng Li <wanpengli@xxxxxxxxxxx> > Fixes: 3b8a5df6c4dc6 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically") > Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> I completely agree with the above. The below change also seems correct to me. Reviewed-by: Liran Alon <liran.alon@xxxxxxxxxx> Having said that, I think we are starting to have too many hard-coded parameters for the algorithm adjusting lapic_timer_advance_ns. (LAPIC_TIMER_ADVANCE_ADJUST_DONE, LAPIC_TIMER_ADVANCE_ADJUST_STEP and now LAPIC_TIMER_ADVANCE_MAX_NS). We should consider changing them to KVM module parameters. In addition, this patch led me wondering what will happen when start_sw_tscdeadline() will sub lapic_timer_advance_ns from “expire” such that (expire < now). I think that because apic->lapic_timer.timer is init with HRTIMER_MODE_ABS_PINNED, it is not allowed to run in softirq and therefore will never expire. Thus, I think that start_sw_tscdeadline() should check if (expire < now) after sub of lapic_timer_advance_ns and if so call apic_timer_expired() instead of hrtimer_start(). I can submit such a patch if you agree. -Liran > --- > arch/x86/kvm/lapic.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > index 9bf70cf84564..92446cba9b24 100644 > --- a/arch/x86/kvm/lapic.c > +++ b/arch/x86/kvm/lapic.c > @@ -74,6 +74,7 @@ static bool lapic_timer_advance_adjust_done = false; > #define LAPIC_TIMER_ADVANCE_ADJUST_DONE 100 > /* step-by-step approximation to mitigate fluctuation */ > #define LAPIC_TIMER_ADVANCE_ADJUST_STEP 8 > +#define LAPIC_TIMER_ADVANCE_MAX_NS 5000 > > static inline int apic_test_vector(int vec, void *bitmap) > { > @@ -1522,6 +1523,10 @@ void wait_lapic_expire(struct kvm_vcpu *vcpu) > } > if (abs(guest_tsc - tsc_deadline) < LAPIC_TIMER_ADVANCE_ADJUST_DONE) > lapic_timer_advance_adjust_done = true; > + if (unlikely(lapic_timer_advance_ns > LAPIC_TIMER_ADVANCE_MAX_NS)) { > + lapic_timer_advance_ns = LAPIC_TIMER_ADVANCE_MAX_NS; > + lapic_timer_advance_adjust_done = true; > + } > } > } > > -- > 2.21.0 >