On Tuesday 11 November 2008 21:22:14 Ingo Molnar wrote:> * Rafael J. Wysocki <rjw@xxxxxxx> wrote:> > So, it evidently fails while re-enabling the non-boot CPU and not> > during disabling it as I thought before. (Resend, due to HTML version previously) But what is calling stop_machine in that path? There *is* a race, but I don't think it could cause this (we should make acopy of active.fnret inside the lock before returning it). Two patches: one fixes that race, the next adds debugging spew. stop_machine: fix race with return value We should not access active.fnret outside the lock; in theory the nextstop_machine could overwrite it. Signed-off-by: Rusty Russell <rusty@xxxxxxxxxxxxxxx>--- kernel/stop_machine.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff -r d7c9a15da615 kernel/stop_machine.c--- a/kernel/stop_machine.c Mon Nov 10 09:47:45 2008 +1100+++ b/kernel/stop_machine.c Tue Nov 11 23:19:47 2008 +1030@@ -112,7 +112,7 @@ int __stop_machine(int (*fn)(void *), void *data, const cpumask_t *cpus) { struct work_struct *sm_work;- int i;+ int i, ret; /* Set up initial state. */ mutex_lock(&lock);@@ -137,8 +137,9 @@ /* This will release the thread on our CPU. */ put_cpu(); flush_workqueue(stop_machine_wq);+ ret = active.fnret; mutex_unlock(&lock);- return active.fnret;+ return ret; } int stop_machine(int (*fn)(void *), void *data, const cpumask_t *cpus)===diff -r fe7dd39b1cff kernel/stop_machine.c--- a/kernel/stop_machine.c Wed Nov 12 14:07:18 2008 +1030+++ b/kernel/stop_machine.c Wed Nov 12 14:09:08 2008 +1030@@ -89,6 +89,8 @@ case STOPMACHINE_RUN: /* On multiple CPUs only a single error code * is needed to tell that something failed. */+ printk("stop_machine: %i running %p\n",+ smp_processor_id(), smdata->fn); err = smdata->fn(smdata->data); if (err) smdata->fnret = err;@@ -106,6 +108,7 @@ /* Callback for CPUs which aren't supposed to do anything. */ static int chill(void *unused) {+ printk("stop_machine: %i chilling\n", smp_processor_id()); return 0; } @@ -126,17 +129,23 @@ set_state(STOPMACHINE_PREPARE); + printk("stop_machine: running on %i cpus:\n", num_threads);+ dump_stack();+ /* Schedule the stop_cpu work on all cpus: hold this CPU so one * doesn't hit this CPU until we're ready. */ get_cpu(); for_each_online_cpu(i) {+ printk("stop_machine: setting up cpu %i\n", i); sm_work = percpu_ptr(stop_machine_work, i); INIT_WORK(sm_work, stop_cpu); queue_work_on(i, stop_machine_wq, sm_work); } /* This will release the thread on our CPU. */+ printk("stop_machine: releasing CPU %i\n", smp_processor_id()); put_cpu(); flush_workqueue(stop_machine_wq);+ printk("stop_machine: done\n"); ret = active.fnret; mutex_unlock(&lock); return ret; ÿôèº{.nÇ+?·?®??+%?Ëÿ±éݶ¥?wÿº{.nÇ+?·¤z¹Þ?û^²×«³ø§¶?¡Ü¨}©?²Æ zÚ&j:+v?¨þø¯ù®w¥þ?à2?Þ?¨èÚ&¢)ß¡«a¶Úÿÿûàz¿äz¹Þ?ú+?ù???Ý¢jÿ?wèþf