On Wed, 30 Jul 2014, Lai Jiangshan wrote: > I think the bug is here, it re-queues the per_cpu(vmstat_work, cpu) which is offline > (after vmstat_cpuup_callback(CPU_DOWN_PREPARE). And cpu_stat_off is accessed without > proper lock. Ok. I guess we need to make the preemption check output more information so that it tells us that an operation was performed on a processor that is down? > I suggest to use get_cpu_online() or a new cpu_stat_off_mutex to protect it. If a processor is downed then cpu_stat_off bit should be cleared but also the worker thread should not run. > > case CPU_DOWN_PREPARE: > > case CPU_DOWN_PREPARE_FROZEN: > > - cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu)); > > - per_cpu(vmstat_work, cpu).work.func = NULL; > > + if (!cpumask_test_and_set_cpu(cpu, cpu_stat_off)) > > + cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu)); > > It is suggest that cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu)) should > be called unconditionally. And the cpu should be cleared from cpu_stat_off. > (you set it, it is BUG according to vmstat_shepherd() and the semantics of the > cpu_stat_off). True. Subject: vmstat ondemand: Fix online/offline races Do not allow onlining/offlining while the shepherd task is checking for vmstat threads. On offlining a processor do the right thing cancelling the vmstat worker thread if it exista and also exclude it from the shepherd process checks. Signed-off-by: Christoph Lameter <cl@xxxxxxxxx> Index: linux/mm/vmstat.c =================================================================== --- linux.orig/mm/vmstat.c 2014-07-30 09:35:54.602662306 -0500 +++ linux/mm/vmstat.c 2014-07-30 09:43:07.109037043 -0500 @@ -1317,6 +1317,7 @@ static void vmstat_shepherd(struct work_ { int cpu; + get_online_cpus(); /* Check processors whose vmstat worker threads have been disabled */ for_each_cpu(cpu, cpu_stat_off) if (need_update(cpu) && @@ -1325,6 +1326,7 @@ static void vmstat_shepherd(struct work_ schedule_delayed_work_on(cpu, &per_cpu(vmstat_work, cpu), __round_jiffies_relative(sysctl_stat_interval, cpu)); + put_online_cpus(); schedule_delayed_work(&shepherd, round_jiffies_relative(sysctl_stat_interval)); @@ -1380,8 +1382,8 @@ static int vmstat_cpuup_callback(struct break; case CPU_DOWN_PREPARE: case CPU_DOWN_PREPARE_FROZEN: - if (!cpumask_test_and_set_cpu(cpu, cpu_stat_off)) - cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu)); + cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu)); + cpumask_clear_cpu(cpu, cpu_stat_off); break; case CPU_DOWN_FAILED: case CPU_DOWN_FAILED_FROZEN: -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>