Hello everyone, I am running into a deadlock while testing bmips power management code. Currently attempting to add power management functionality to arch/mips/configs/bmips_stb_defconfig. The kernel locks up when coming back from a suspend state. I am working on a bcm7435 board. In arch/mips/kernel/smp.c:start_secondary --- asmlinkage void start_secondary(void) { .... set_cpu_online(cpu, true); set_cpu_sibling_map(cpu); set_cpu_core_map(cpu); calculate_cpu_foreign_map(); cpumask_set_cpu(cpu, &cpu_callin_map); synchronise_count_slave(cpu); .... } --- The deadlock occurs because the set_cpu_online() is called before synchronise_count_slave(). This can cause a deadlock if the boot cpu sees that the secondary cpu is online and tries to execute a function on it before it synchronizes with it. The boot cpu ends up waiting for the secondary cpu to execute a function, while the secondary cpu waits for the boot cpu to synchronise with it. Lets assume the following occurs. 1. CPU0 starts CPU1. CPU0 starts waiting for CPU1 to start up. CPU0 ends up at arch/mips/kernel/smp.c:__cpu_up(). --- int __cpu_up(unsigned int cpu, struct task_struct *tidle) { mp_ops->boot_secondary(cpu, tidle); /* * Trust is futile. We should really have timeouts ... */ while (!cpumask_test_cpu(cpu, &cpu_callin_map)) { udelay(100); schedule(); } synchronise_count_master(cpu); return 0; } --- While CPU0 waits for CPU1 it schedules another thread. 2. CPU0 begins executing a new thread and eventually ends up at kernel/smp.c:smp_call_function_many() --- void smp_call_function_many(const struct cpumask *mask, smp_call_func_t func, void *info, bool wait) { .... /* No online cpus? We're done. */ if (cpu >= nr_cpu_ids) return; /* Do we have another CPU which isn't us? */ next_cpu = cpumask_next_and(cpu, mask, cpu_online_mask); if (next_cpu == this_cpu) next_cpu = cpumask_next_and(next_cpu, mask, cpu_online_mask); /* Fastpath: do that cpu by itself. */ if (next_cpu >= nr_cpu_ids) { smp_call_function_single(cpu, func, info, wait); return; } .... } --- 3. CPU1 executes set_cpu_online() and blocks at synchronise_count_slave(). Thus CPU1 is blocked, however it tells everyone it is online. 4. CPU0(in kernel/smp.c:smp_call_function_many()) sees that one CPU is online and attempts to a run a function on that CPU(which is CPU1). CPU0 then blocks with no preempt and irqs off. Thus both CPUs are deadlocked. CPU0 is blocked at smp_call_function_single() CPU1 is blocked at synchronise_count_slave() I am running into this issue with this execution. kernel/power/standby.c: suspend_enter() kernel/power/standby.c: syscore_resume() (Coming out of suspend, only 1 cpu up) kernel/time/timekeeping.c: timekeeping_resume() (syscore calls the timekeeping resume hook) kernel/time/hrtimer.c: hrtimer_resume() kernel/time/hrtimer.c: clock_was_set_delayed() (We schedule some work for later) ... CPU0 then starts up CPU1 arch/mips/kernel/smp.c: __cpu_up() (Comes here and decides to schedule the hrtimer thread) kernel/time/hrtimer.c: clock_was_set_work() kernel/time/hrtimer.c: clock_was_set() kernel/time/hrtimer.c: on_each_cpu() (and this is where we get screwed) ... kernel/smp.c: smp_call_function_many() (Eventually gets here and blocks if CPU1 already executed set_cpu_online()) The deadlock doesn't happen when I test pm with no_console_suspend. I am assuming this happens because things get printed out before "set_cpu_online()" gets executed. Thus delaying the timing. Then CPU0 does not see that CPU1 is online when running "smp_call_function_many()". Am I seeing this correctly? What would be the proper fix to this? Thanks, Justin