Re: kvm guest: hrtimer: interrupt too slow

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



(Adding Thomas in Cc)


On Sat, Oct 03, 2009 at 08:12:05PM -0300, Marcelo Tosatti wrote:
> Michael,
> 
> Can you please give the patch below a try please? (without acpi_pm timer 
> or priority adjustments for the guest).
> 
> On Tue, Sep 29, 2009 at 05:12:17PM +0400, Michael Tokarev wrote:
> > Hello.
> >
> > I'm having quite an.. unusable system here.
> > It's not really a regresssion with 0.11.0,
> > it was something similar before, but with
> > 0.11.0 and/or 2.6.31 it become much worse.
> >
> > The thing is that after some uptime, kvm
> > guest prints something like this:
> >
> > hrtimer: interrupt too slow, forcing clock min delta to 461487495 ns
> >
> > after which system (guest) speeed becomes
> > very slow.  The above message is from
> > 2.6.31 guest running wiht 0.11.0 & 2.6.31
> > host.  Before I tried it with 0.10.6 and
> > 2.6.30 or 2.6.27, and the delta were a
> > bit less than that:
> >
> > hrtimer: interrupt too slow, forcing clock min delta to 152222415 ns
> > hrtimer: interrupt too slow, forcing clock min delta to 93629025 ns
> 
> It seems the way hrtimer_interrupt_hanging calculates min_delta is
> wrong (especially to virtual machines). The guest vcpu can be scheduled
> out during the execution of the hrtimer callbacks (and the callbacks
> themselves can do operations that translate to blocking operations in
> the hypervisor).
> 
> So high min_delta values can be calculated if, for example, a single
> hrtimer_interrupt run takes two host time slices to execute, while some
> other higher priority task runs for N slices in between.
> 
> Using the hrtimer_interrupt execution time (which can be the worse
> case at any given time), as the min_delta is problematic.
> 
> So simply increase min_delta_ns by 50% once every detected failure,
> which will eventually lead to an acceptable threshold (the algorithm
> should scale back to down lower min_delta, to adjust back to wealthier
> times, too).
> 
> diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
> index 49da79a..8997978 100644
> --- a/kernel/hrtimer.c
> +++ b/kernel/hrtimer.c
> @@ -1234,28 +1234,20 @@ static void __run_hrtimer(struct hrtimer *timer)
>  
>  #ifdef CONFIG_HIGH_RES_TIMERS
>  
> -static int force_clock_reprogram;
> -
>  /*
>   * After 5 iteration's attempts, we consider that hrtimer_interrupt()
>   * is hanging, which could happen with something that slows the interrupt
> - * such as the tracing. Then we force the clock reprogramming for each future
> - * hrtimer interrupts to avoid infinite loops and use the min_delta_ns
> - * threshold that we will overwrite.
> - * The next tick event will be scheduled to 3 times we currently spend on
> - * hrtimer_interrupt(). This gives a good compromise, the cpus will spend
> - * 1/4 of their time to process the hrtimer interrupts. This is enough to
> - * let it running without serious starvation.
> + * such as the tracing, so we increase min_delta_ns.
>   */
>  
>  static inline void
> -hrtimer_interrupt_hanging(struct clock_event_device *dev,
> -			ktime_t try_time)
> +hrtimer_interrupt_hanging(struct clock_event_device *dev)
>  {
> -	force_clock_reprogram = 1;
> -	dev->min_delta_ns = (unsigned long)try_time.tv64 * 3;
> -	printk(KERN_WARNING "hrtimer: interrupt too slow, "
> -		"forcing clock min delta to %lu ns\n", dev->min_delta_ns);
> +	dev->min_delta_ns += dev->min_delta_ns >> 1;


I haven't thought about the guest that could be scheduled out in
the middle of the timers servicing, making wrong this check based
of the time spent in hrtimer_interrupt().

I guess there is no easy/generic/cheap way to rebase this check
on the _virtual_ time spent in the timers servicing. By virtual,
I mean the time spent in the guest only.

In a non-guest kernel, the old check forces an adaptive rate
sharing:

- we spent n nanosecs to service the batch of timers.
- we are hanging
- we want at least 3/4 of time reserved for non-timer
  servicing in the kernel, this is a minimum prerequisite
  for the system to not starve
- adapt the min_clock_delta against to fit the above constraint

All that does not make sense anymore in a guest. The hang detection
and warnings, the recalibrations of the min_clock_deltas are completely
wrong in this context.
Not only does it spuriously warn, but the minimum timer is increasing
slowly and the guest progressively suffers from higher and higher
latencies.

That's really bad.

Your patch lowers the immediate impact and makes this illness evolving
smoother by scaling down the recalibration to the min_clock_delta.
This appeases the bug but doesn't solve it. I fear it could be even
worse because it makes it more discreet.


May be can we instead increase the minimum threshold of loop in the
hrtimer interrupt before considering it as a hang? Hmm, but a too high
number could make this check useless, depending of the number of pending
timers, which is a finite number.

Well, actually I'm not confident anymore in this check. Or actually we
should change it. May be we can rebase it on the time spent on the hrtimer
interrupt (and check it every 10 loops of reprocessing in hrtimer_interrupts).

Would a mimimum threshold of 5 seconds spent in hrtimer_interrupt() be
a reasonable check to perform?
We should probably base our check on such kind of high boundary.
What we want is an ultimate rescue against hard hangs anyway, not
something that can solve the hang source itself. After the min_clock_delta
recalibration, the system will be unstable (eg: high latencies).
So if this must behave as a hammer, let's ensure we really need this hammer,
even if we need to wait for few seconds before it triggers.


What do you think?

Thanks,
Frederic.




> +	if (printk_ratelimit())
> +		printk(KERN_WARNING "hrtimer: interrupt too slow, "
> +			"forcing clock min delta to %lu ns\n",
> +			dev->min_delta_ns);
>  }
>  /*
>   * High resolution timer interrupt
> @@ -1276,7 +1268,7 @@ void hrtimer_interrupt(struct clock_event_device *dev)
>   retry:
>  	/* 5 retries is enough to notice a hang */
>  	if (!(++nr_retries % 5))
> -		hrtimer_interrupt_hanging(dev, ktime_sub(ktime_get(), now));
> +		hrtimer_interrupt_hanging(dev);
>  
>  	now = ktime_get();
>  
> @@ -1342,7 +1334,7 @@ void hrtimer_interrupt(struct clock_event_device *dev)
>  
>  	/* Reprogramming necessary ? */
>  	if (expires_next.tv64 != KTIME_MAX) {
> -		if (tick_program_event(expires_next, force_clock_reprogram))
> +		if (tick_program_event(expires_next, 0))
>  			goto retry;
>  	}
>  }

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux