On Wednesday, 12 of November 2008, Rusty Russell wrote: > On Tuesday 11 November 2008 21:22:14 Ingo Molnar wrote: > > * Rafael J. Wysocki <rjw@xxxxxxx> wrote: > > > So, it evidently fails while re-enabling the non-boot CPU and not > > > during disabling it as I thought before. > > (Resend, due to HTML version previously) > > But what is calling stop_machine in that path? > > There *is* a race, but I don't think it could cause this (we should make a > copy of active.fnret inside the lock before returning it). Still, that seems to be the case. > Two patches: one fixes that race, the next adds debugging spew. > > stop_machine: fix race with return value With this patch applied (reproduced below for clarity) the problem is not reproducible any more. Care to push it upstream ASAP? Thanks, Rafael --- stop_machine: fix race with return value We should not access active.fnret outside the lock; in theory the next stop_machine could overwrite it. Signed-off-by: Rusty Russell <rusty@xxxxxxxxxxxxxxx> --- kernel/stop_machine.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff -r d7c9a15da615 kernel/stop_machine.c --- a/kernel/stop_machine.c Mon Nov 10 09:47:45 2008 +1100 +++ b/kernel/stop_machine.c Tue Nov 11 23:19:47 2008 +1030 @@ -112,7 +112,7 @@ int __stop_machine(int (*fn)(void *), void *data, const cpumask_t *cpus) { struct work_struct *sm_work; - int i; + int i, ret; /* Set up initial state. */ mutex_lock(&lock); @@ -137,8 +137,9 @@ /* This will release the thread on our CPU. */ put_cpu(); flush_workqueue(stop_machine_wq); + ret = active.fnret; mutex_unlock(&lock); - return active.fnret; + return ret; } int stop_machine(int (*fn)(void *), void *data, const cpumask_t *cpus) -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html