Re: [PATCH] timers/nohz: introduce nohz_full_aggressive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[ Added Anna-Maria who is doing some timer work as well ]

On Sun,  7 May 2023 11:07:00 +0200
Andrea Righi <andrea.righi@xxxxxxxxxxxxx> wrote:

> Overview:
> 
> nohz_full is a feature that allows to reduce the number of CPU tick
> interrupts, thereby improving energy efficiency and reducing kernel
> jitter.

Hmm, I never thought of NOHZ_FULL used for energy efficiency, as the
CPU is still running user space code, and there's really nothing
inherently more power consuming with the tick.

> 
> This works by stopping the tick interrupts on the CPUs that are either
> idle or that have only one runnable task on them (there is no reason to
> periodically interrupt the execution of a single running task if none
> else is waiting to acquire the same CPU).
> 
> It is not possible to configure all the available CPUs to work in the
> nohz_full mode, at least one non-adaptive-tick CPU must be periodically
> interrupted to properly handle timekeeping tasks in the system (such as
> the gettimeofday() syscall returning accurate values).

Do we really need nohz_full, instead, I think you want to look at what
Anna-Maria is doing with moving the timer "manager" around to make sure
that the tick stays on busy CPUs.

Again, nohz_full is not for power consumption savings, but instead to
reduce kernel interruption in user space.

> 
> However, under certain conditions, we may want to relax this constraint,
> accepting potential time inaccuracies in the system, in order to provide
> additional benefits in terms of power consumption, performance and/or
> reduce kernel jitter even more.
> 
> For this reason introduce the new parameter nohz_full_aggressive.
> 
> This option allows to enforce nozh_full across all the CPUs (even the
> timekeeping CPU) at the cost of having potential timer inaccuracies in
> the system.
> 
> Test:
> 
>  - Hardware: Dell XPS 13 7390 w/ 8 cores
> 
>  - Kernel is using CONFIG_HZ=1000 (worst case scenario in terms of
>    power consumption and kernel jitter) and nohz_full=all
> 
>  - Measure interrupts and power consumption when the system is idle and
>    with 2, 4 and 8 cpu hogs
> 
> Result:
> 
> The following numbers have been collected using turbostat and dstat
> measuring the average over a 5min run for each test.
> 
> irqs/sec             idle     1 CPU hog  2 CPU hogs  4 CPU hogs  8 CPU hogs
>                      ------------------------------------------------------
> nohz_full            1036.679  1047.522    1046.203    1048.590    1074.867
> nohz_full_aggressive   98.685   106.296     127.587     146.586    1062.277
> 
> Power (Watt)         idle     1 CPU hog  2 CPU hogs  4 CPU hogs  8 CPU hogs
>                      ------------------------------------------------------
> nohz_full            0.502 W    3.436 W    3.755 W    6.187 W    6.019 W
> nohz_full_aggressive 0.301 W    2.372 W    2.372 W    6.005 W    6.016 W
> 
> % power reduction     40.04%     30.97%     36.83%      2.94%      0.05%
> 

Nice.

Now I doubt this is acceptable considering the side effects that the
timer inaccuracy can cause. I think this breaks some basic assumptions
in both the kernel and user space.

Now, I think what is really happening here is that you are somewhat
simulating the results that Anna-Maria has indirectly. That is, you
just prevent an idle CPU from waking up to handle interrupts when not
needed.

Anna-Maria,

Do you have some patches that Andrea could test with?

Thanks,

-- Steve

> Conclusion:
> 
> nohz_full_aggressive used together with nohz_full=all allows to save
> some energy when the system is idle or under low CPU usage (e.g., when
> less than half of the CPUs are used).
> 
> Under high CPU load conditions power consumption is pretty much
> identical to nohz_full=all because the impact of the saved power/irqs on
> the timekeeping CPU doesn't contribute very much to the total energy
> consumption.
> 
> However, enabling nohz_full_aggressive can lead to timing inaccuracies
> in the system, because periodic ticks can be disabled also on the
> timekeeping CPU.
> 
> Note:
> 
> I wrote this patch while I was stuck in the airport, because my flight
> was delayed and I was trying to optimize the battery usage of my laptop
> in more creative ways. Ultimately I ended up wasting a lot more energy
> to test this patch, but at least the long wait wasn't too boring.
> 
> Signed-off-by: Andrea Righi <andrea.righi@xxxxxxxxxxxxx>
> ---
>  .../ABI/testing/sysfs-devices-system-cpu      | 12 ++++++++++++
>  .../admin-guide/kernel-parameters.txt         |  7 +++++++
>  Documentation/timers/no_hz.rst                |  5 +++++
>  drivers/base/cpu.c                            | 19 +++++++++++++++++++
>  include/linux/tick.h                          |  7 +++++++
>  kernel/time/hrtimer.c                         |  7 ++++++-
>  kernel/time/tick-sched.c                      | 16 +++++++++++++---
>  7 files changed, 69 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
> index f54867cadb0f..aa620e154d54 100644
> --- a/Documentation/ABI/testing/sysfs-devices-system-cpu
> +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
> @@ -679,6 +679,18 @@ Description:
>  		(RO) the list of CPUs that are in nohz_full mode.
>  		These CPUs are set by boot parameter "nohz_full=".
>  
> +What:		/sys/devices/system/cpu/nohz_full_aggressive
> +Date:		Apr 2023
> +Contact:	Linux kernel mailing list <linux-kernel@xxxxxxxxxxxxxxx>
> +Description:
> +		(RW) enable/disable nohz_full also for the timekeeping CPU.
> +
> +		WARNING: enabling this option can cause potential
> +		high-resolution timer inaccuracies in the system.
> +
> +		This option can be set by boot parameter
> +		"nohz_full_aggressive".
> +
>  What:		/sys/devices/system/cpu/isolated
>  Date:		Apr 2015
>  Contact:	Linux kernel mailing list <linux-kernel@xxxxxxxxxxxxxxx>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 9e5bab29685f..23c6fe20e067 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3732,6 +3732,13 @@
>  			Note that this argument takes precedence over
>  			the CONFIG_RCU_NOCB_CPU_DEFAULT_ALL option.
>  
> +	nohz_full_aggressive
> +			[KNL,BOOT,SMP,ISOL] allow to enable nohz_full also for
> +			the timekeeping CPU.
> +
> +			WARNING: enabling this option can cause potential
> +			high-resolution timer inaccuracies in the system.
> +
>  	noinitrd	[RAM] Tells the kernel not to load any configured
>  			initial RAM disk.
>  
> diff --git a/Documentation/timers/no_hz.rst b/Documentation/timers/no_hz.rst
> index f8786be15183..aa9f79297d77 100644
> --- a/Documentation/timers/no_hz.rst
> +++ b/Documentation/timers/no_hz.rst
> @@ -136,6 +136,11 @@ error message, and the boot CPU will be removed from the mask.  Note that
>  this means that your system must have at least two CPUs in order for
>  CONFIG_NO_HZ_FULL=y to do anything for you.
>  
> +This constraint can be relaxed passing the parameter "nohz_full_aggressive".
> +With this option enabled the timekeeping CPU can be also configured to use
> +non-adaptive ticks, at the cost of having potential high-resolution timer
> +inaccuracies and in the system.
> +
>  Finally, adaptive-ticks CPUs must have their RCU callbacks offloaded.
>  This is covered in the "RCU IMPLICATIONS" section below.
>  
> diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
> index c1815b9dae68..b55d6111a733 100644
> --- a/drivers/base/cpu.c
> +++ b/drivers/base/cpu.c
> @@ -280,6 +280,24 @@ static ssize_t print_cpus_nohz_full(struct device *dev,
>  	return sysfs_emit(buf, "%*pbl\n", cpumask_pr_args(tick_nohz_full_mask));
>  }
>  static DEVICE_ATTR(nohz_full, 0444, print_cpus_nohz_full, NULL);
> +
> +static ssize_t
> +nohz_full_aggressive_show(struct device *dev, struct device_attribute *attr,
> +			  char *buf)
> +{
> +	return sysfs_emit(buf, "%d\n", tick_nohz_full_aggressive);
> +}
> +
> +static ssize_t nohz_full_aggressive_store(struct device *dev,
> +					  struct device_attribute *attr,
> +					  const char *buf, size_t count)
> +{
> +	if (kstrtobool(buf, &tick_nohz_full_aggressive))
> +		return -EINVAL;
> +	return count;
> +}
> +
> +static DEVICE_ATTR_RW(nohz_full_aggressive);
>  #endif
>  
>  static void cpu_device_release(struct device *dev)
> @@ -468,6 +486,7 @@ static struct attribute *cpu_root_attrs[] = {
>  	&dev_attr_isolated.attr,
>  #ifdef CONFIG_NO_HZ_FULL
>  	&dev_attr_nohz_full.attr,
> +	&dev_attr_nohz_full_aggressive.attr,
>  #endif
>  #ifdef CONFIG_GENERIC_CPU_AUTOPROBE
>  	&dev_attr_modalias.attr,
> diff --git a/include/linux/tick.h b/include/linux/tick.h
> index 9459fef5b857..8d557838b3f6 100644
> --- a/include/linux/tick.h
> +++ b/include/linux/tick.h
> @@ -176,6 +176,7 @@ static inline void tick_nohz_idle_stop_tick_protected(void) { }
>  
>  #ifdef CONFIG_NO_HZ_FULL
>  extern bool tick_nohz_full_running;
> +extern bool tick_nohz_full_aggressive;
>  extern cpumask_var_t tick_nohz_full_mask;
>  
>  static inline bool tick_nohz_full_enabled(void)
> @@ -186,6 +187,11 @@ static inline bool tick_nohz_full_enabled(void)
>  	return tick_nohz_full_running;
>  }
>  
> +static inline bool tick_nohz_full_aggressive_enabled(void)
> +{
> +	return !!tick_nohz_full_aggressive;
> +}
> +
>  /*
>   * Check if a CPU is part of the nohz_full subset. Arrange for evaluating
>   * the cpu expression (typically smp_processor_id()) _after_ the static
> @@ -276,6 +282,7 @@ extern void __tick_nohz_task_switch(void);
>  extern void __init tick_nohz_full_setup(cpumask_var_t cpumask);
>  #else
>  static inline bool tick_nohz_full_enabled(void) { return false; }
> +static inline bool tick_nohz_full_aggressive_enabled(void) { return false; }
>  static inline bool tick_nohz_full_cpu(int cpu) { return false; }
>  static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask) { }
>  
> diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
> index e8c08292defc..b3f27c6c8475 100644
> --- a/kernel/time/hrtimer.c
> +++ b/kernel/time/hrtimer.c
> @@ -1866,7 +1866,12 @@ void hrtimer_interrupt(struct clock_event_device *dev)
>  	else
>  		expires_next = ktime_add(now, delta);
>  	tick_program_event(expires_next, 1);
> -	pr_warn_once("hrtimer: interrupt took %llu ns\n", ktime_to_ns(delta));
> +	/*
> +	 * This is a "normal" condition when nohz_full_aggressive mode is
> +	 * enabled, so avoid printing this warning in this case.
> +	 */
> +	if (!tick_nohz_full_aggressive_enabled())
> +		pr_warn_once("hrtimer: interrupt took %llu ns\n", ktime_to_ns(delta));
>  }
>  
>  /* called with interrupts disabled */
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 52254679ec48..8864066e4746 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -188,7 +188,8 @@ static void tick_sched_do_timer(struct tick_sched *ts, ktime_t now)
>  	 */
>  	if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE)) {
>  #ifdef CONFIG_NO_HZ_FULL
> -		WARN_ON_ONCE(tick_nohz_full_running);
> +		if (!tick_nohz_full_aggressive_enabled())
> +			WARN_ON_ONCE(tick_nohz_full_running);
>  #endif
>  		tick_do_timer_cpu = cpu;
>  	}
> @@ -250,6 +251,8 @@ cpumask_var_t tick_nohz_full_mask;
>  EXPORT_SYMBOL_GPL(tick_nohz_full_mask);
>  bool tick_nohz_full_running;
>  EXPORT_SYMBOL_GPL(tick_nohz_full_running);
> +bool tick_nohz_full_aggressive;
> +EXPORT_SYMBOL_GPL(tick_nohz_full_aggressive);
>  static atomic_t tick_dep_mask;
>  
>  static bool check_tick_dependency(atomic_t *dep)
> @@ -524,6 +527,13 @@ void __tick_nohz_task_switch(void)
>  	}
>  }
>  
> +static int __init tick_nohz_full_aggressive_setup(char *str)
> +{
> +	tick_nohz_full_aggressive = true;
> +	return 1;
> +}
> +__setup("nohz_full_aggressive", tick_nohz_full_aggressive_setup);
> +
>  /* Get the boot-time nohz CPU list from the kernel parameters. */
>  void __init tick_nohz_full_setup(cpumask_var_t cpumask)
>  {
> @@ -854,7 +864,7 @@ static ktime_t tick_nohz_next_event(struct tick_sched *ts, int cpu)
>  	 * Otherwise we can sleep as long as we want.
>  	 */
>  	delta = timekeeping_max_deferment();
> -	if (cpu != tick_do_timer_cpu &&
> +	if ((tick_nohz_full_aggressive_enabled() || cpu != tick_do_timer_cpu) &&
>  	    (tick_do_timer_cpu != TICK_DO_TIMER_NONE || !ts->do_timer_last))
>  		delta = KTIME_MAX;
>  
> @@ -1073,7 +1083,7 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
>  	if (unlikely(report_idle_softirq()))
>  		return false;
>  
> -	if (tick_nohz_full_enabled()) {
> +	if (tick_nohz_full_enabled() && !tick_nohz_full_aggressive_enabled()) {
>  		/*
>  		 * Keep the tick alive to guarantee timekeeping progression
>  		 * if there are full dynticks CPUs around




[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux