On Fri, 23 Oct 2015, Sergey Senozhatsky wrote: > On (10/23/15 06:43), Christoph Lameter wrote: > > Is this ok? > > kernel/sched/loadavg.c: In function ‘calc_load_enter_idle’: > kernel/sched/loadavg.c:195:2: error: implicit declaration of function ‘quiet_vmstat’ [-Werror=implicit-function-declaration] > quiet_vmstat(); > ^ Oww... Not good to do that in the scheduler. Ok new patch follows that does the call from tick_nohz_stop_sched_tick. Hopefully that is the right location to call quiet_vmstat(). > > + if (!cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off)) > > + cancel_delayed_work(this_cpu_ptr(&vmstat_work)); > > shouldn't preemption be disable for smp_processor_id() here? Preemption is disabled when quiet_vmstat() is called. Subject: Fix vmstat: make vmstat_updater deferrable again and shut down on idle V2 V1->V2 - Call vmstat_quiet from tick_nohz_stop_sched_tick() instead. Currently the vmstat updater is not deferrable as a result of commit ba4877b9ca51f80b5d30f304a46762f0509e1635. This in turn can cause multiple interruptions of the applications because the vmstat updater may run at different times than tick processing. No good. Make vmstate_update deferrable again and provide a function that shuts down the vmstat updater when we go idle by folding the differentials. Shut it down from the load average calculation logic introduced by nohz. Note that the shepherd thread will continue scanning the differentials from another processor and will reenable the vmstat workers if it detects any changes. Fixes: ba4877b9ca51f80b5d30f304a46762f0509e1635 (do not use deferrable delay) Signed-off-by: Christoph Lameter <cl@xxxxxxxxx> Index: linux/mm/vmstat.c =================================================================== --- linux.orig/mm/vmstat.c +++ linux/mm/vmstat.c @@ -1395,6 +1395,20 @@ static void vmstat_update(struct work_st } /* + * Switch off vmstat processing and then fold all the remaining differentials + * until the diffs stay at zero. The function is used by NOHZ and can only be + * invoked when tick processing is not active. + */ +void quiet_vmstat(void) +{ + do { + if (!cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off)) + cancel_delayed_work(this_cpu_ptr(&vmstat_work)); + + } while (refresh_cpu_vm_stats()); +} + +/* * Check if the diffs for a certain cpu indicate that * an update is needed. */ @@ -1426,7 +1440,7 @@ static bool need_update(int cpu) */ static void vmstat_shepherd(struct work_struct *w); -static DECLARE_DELAYED_WORK(shepherd, vmstat_shepherd); +static DECLARE_DEFERRABLE_WORK(shepherd, vmstat_shepherd); static void vmstat_shepherd(struct work_struct *w) { Index: linux/include/linux/vmstat.h =================================================================== --- linux.orig/include/linux/vmstat.h +++ linux/include/linux/vmstat.h @@ -211,6 +211,7 @@ extern void __inc_zone_state(struct zone extern void dec_zone_state(struct zone *, enum zone_stat_item); extern void __dec_zone_state(struct zone *, enum zone_stat_item); +void quiet_vmstat(void); void cpu_vm_stats_fold(int cpu); void refresh_zone_stat_thresholds(void); @@ -272,6 +273,7 @@ static inline void __dec_zone_page_state static inline void refresh_cpu_vm_stats(int cpu) { } static inline void refresh_zone_stat_thresholds(void) { } static inline void cpu_vm_stats_fold(int cpu) { } +static inline void quiet_vmstat(void) { } static inline void drain_zonestat(struct zone *zone, struct per_cpu_pageset *pset) { } Index: linux/kernel/time/tick-sched.c =================================================================== --- linux.orig/kernel/time/tick-sched.c +++ linux/kernel/time/tick-sched.c @@ -667,6 +667,7 @@ static ktime_t tick_nohz_stop_sched_tick */ if (!ts->tick_stopped) { nohz_balance_enter_idle(cpu); + quiet_vmstat(); calc_load_enter_idle(); ts->last_tick = hrtimer_get_expires(&ts->sched_timer);