On Wed, Dec 16, 2009 at 3:18 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > On Wed, 2009-12-16 at 12:24 +0530, Sachin Sant wrote: >> Xiaotian Feng wrote: >> > On Wed, Dec 16, 2009 at 2:41 PM, Sachin Sant <sachinp@xxxxxxxxxx> wrote: >> > >> >> Xiaotian Feng wrote: >> >> >> >>> Does this testcase hotplug cpu 0 off? >> >>> >> >>> >> >> No, i don't think so. It skips cpu0 during online/offline >> >> process. >> >> >> > >> > Then how could this happen ? Looks like cpu 0 is offline .... >> > 0:mon> <4>IRQ 17 affinity broken off cpu 0 >> > <4>IRQ 18 affinity broken off cpu 0 >> > <4>IRQ 19 affinity broken off cpu 0 >> > <4>IRQ 264 affinity broken off cpu 0 >> > <4>cpu 0 (hwid 0) Ready to die... >> > <7>clockevent: decrementer mult[83126e97] shift[32] cpu[0] >> > >> Sorry i was looking at only one script. Looking more closely >> at the test there are 6 different sub tests. The rest of the >> tests do seem to hotplug CPU 0. > > Ooh, cute, so you can actually hotplug cpu 0.. no wonder that didn't get > exposed on x86. > > Still, the only time cpu_active_mask should not be equal to > cpu_online_mask is when we're in the middle of a hotplug, we clear > active early and set it late, but its all done under the hotplug mutex, > so we can at most have 1 cpu differences with online mask. > Could follow be possible? We know there's cpu 0 and cpu 1, offline cpu1 > done offline cpu0 > false consider this in cpu_down code, int __ref cpu_down(unsigned int cpu) { <snip> set_cpu_active(cpu, false); // here, we set cpu 0 to inactive synchronize_sched(); err = _cpu_down(cpu, 0); out: <snip> } Then in _cpu_down code: static int __ref _cpu_down(unsigned int cpu, int tasks_frozen) { <snip> if (num_online_cpus() == 1) // if we're trying to offline cpu0, num_online_cpus will be 1 return -EBUSY; // after return back to cpu_down, we didn't change cpu 0 back to active if (!cpu_online(cpu)) return -EINVAL; if (!alloc_cpumask_var(&old_allowed, GFP_KERNEL)) return -ENOMEM; <snip> } Then cpu 0 is not active, but online, then we try to offline cpu1, ....... This can not be exposed because x86 does not have /sys/devices/system/cpu0/online. I guess following patch fixes this bug. --- diff --git a/kernel/cpu.c b/kernel/cpu.c index 291ac58..21ddace 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -199,14 +199,18 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen) .hcpu = hcpu, }; - if (num_online_cpus() == 1) + if (num_online_cpus() == 1) { + set_cpu_active(cpu, true); return -EBUSY; + } if (!cpu_online(cpu)) return -EINVAL; - if (!alloc_cpumask_var(&old_allowed, GFP_KERNEL)) + if (!alloc_cpumask_var(&old_allowed, GFP_KERNEL)) { + set_cpu_active(cpu, true); return -ENOMEM; + } cpu_hotplug_begin(); err = __raw_notifier_call_chain(&cpu_chain, CPU_DOWN_PREPARE | mod, > Unless of course, I messed up, which appears to be rather likely given > these problems ;-) > > -- To unsubscribe from this list: send the line "unsubscribe linux-next" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html