On Sat 17-01-15 13:48:34, Christoph Lameter wrote: > On Sat, 17 Jan 2015, Vinayak Menon wrote: > > > which had not updated the vmstat_diff. This CPU was in idle for around 30 > > secs. When I looked at the tvec base for this CPU, the timer associated with > > vmstat_update had its expiry time less than current jiffies. This timer had > > its deferrable flag set, and was tied to the next non-deferrable timer in the > > We can remove the deferrrable flag now since the vmstat threads are only > activated as necessary with the recent changes. Looks like this could fix > your issue? OK, I have checked the history and the deferrable behavior has been introduced by 39bf6270f524 (VM statistics: Make timer deferrable) which hasn't offered any numbers which would justify the change. So I think it would be a good idea to revert this one as it can clearly cause issues. Could you retest with this change? It still wouldn't help with the highly overloaded workqueues but that sounds like a bigger change and this one sounds like quite safe to me so it is a good start. --- >From 12d00a8066e336d3e1311600b50fa9b588798448 Mon Sep 17 00:00:00 2001 From: Michal Hocko <mhocko@xxxxxxx> Date: Mon, 26 Jan 2015 18:07:51 +0100 Subject: [PATCH] vmstat: Do not use deferrable delayed work for vmstat_update Vinayak Menon has reported that excessive number of tasks was throttled in the direct reclaim inside too_many_isolated because NR_ISOLATED_FILE was relatively high compared to NR_INACTIVE_FILE. However it turned out that the real number of NR_ISOLATED_FILE was 0 and the per-cpu vm_stat_diff wasn't transfered into the global counter. vmstat_work which is responsible for the sync is defined as deferrable delayed work which means that the defined timeout doesn't wake up an idle CPU. A CPU might stay in an idle state for a long time and general effort is to keep such a CPU in this state as long as possible which might lead to all sorts of troubles for vmstat consumers as can be seen with the excessive direct reclaim throttling. This patch basically reverts 39bf6270f524 (VM statistics: Make timer deferrable) but it shouldn't cause any problems for idle CPUs because only CPUs with an active per-cpu drift are woken up since 7cc36bbddde5 (vmstat: on-demand vmstat workers v8) and CPUs which are idle for a longer time shouldn't have per-cpu drift. Fixes: 39bf6270f524 (VM statistics: Make timer deferrable) Reported-and-debugged-by: Vinayak Menon <vinmenon@xxxxxxxxxxxxxx> Signed-off-by: Michal Hocko <mhocko@xxxxxxx> --- mm/vmstat.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/vmstat.c b/mm/vmstat.c index c95d6b39ac91..b9b9deec1d54 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1453,7 +1453,7 @@ static void __init start_shepherd_timer(void) int cpu; for_each_possible_cpu(cpu) - INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu), + INIT_DELAYED_WORK(per_cpu_ptr(&vmstat_work, cpu), vmstat_update); if (!alloc_cpumask_var(&cpu_stat_off, GFP_KERNEL)) -- 2.1.4 -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>