On Mon, 26 Jan 2009 18:16:18 +0100 Ingo Molnar <mingo@xxxxxxx> wrote: > > * Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > > Yet another kernel thread for each CPU. All because of some dung > > > > way down in arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c. > > > > > > > > Is there no other way? > > > > > > Perhaps, but this works. Trying to be clever got me into this mess in > > > the first place. > > > > > > We could stop using workqueues and change work_on_cpu to create a > > > thread every time, which would give it a new failure mode so I don't > > > know that everyone could use it any more. Or we could keep a single > > > thread around to do all the cpus, and duplicate much of the workqueue > > > code. > > > > > > None of these options are appealing... > > > > Can we try harder please? 10 screenfuls of kernel threads in the ps > > output is just irritating. > > > > How about banning the use of work_on_cpu() from schedule_work() handlers > > and then fixing that driver somehow? > > Yes, but that's fundamentally fragile: anyone who happens to stick the > wrong thing into keventd (and it's dead easy because schedule_work() is > easy to use) will lock up work_on_cpu() users. > --- a/kernel/workqueue.c~a +++ a/kernel/workqueue.c @@ -998,6 +998,8 @@ long work_on_cpu(unsigned int cpu, long { struct work_for_cpu wfc; + BUG_ON(current_is_keventd()); + INIT_WORK(&wfc.work, do_work_for_cpu); wfc.fn = fn; wfc.arg = arg; _ That wasn't so hard. > work_on_cpu() is an important (and lowlevel enough) facility to be > isolated from casual interaction like that. We have one single (known) caller in the whole kernel. This is not worth adding another great pile of kernel threads for! > > What _is_ the bug anyway? The only description we were given was > > > > Impact: remove potential clashes with generic kevent workqueue > > > > Annoyingly, some places we want to use work_on_cpu are already in > > workqueues. As per Ingo's suggestion, we create a different > > workqueue for work_on_cpu. > > > > which didn't bother telling anyone squat. > > > > When was this bug added? Was it added into that driver or was it due to > > infrastructural changes? > > This fixes lockups during bootup caused by the cpumask changes/cleanups > which changed set_cpus_allowed()+on-kernel-stack-cpumask_t to > work_on_cpu(). > > Which was fine except it didnt take into account the interaction with the > kevents workqueue and the very wide cross section for worklet dependencies > that this brings with itself. work_on_cpu() was rarely used before so this > didnt show up. Am still awaiting an understandable description of this alleged bug :( If someone can be persuaded to cough up this information (which should have been in the changelog from day #1) then perhaps someone will be able to think of a more pleasing fix. That's one of the reasons for writing changelogs. -- To unsubscribe from this list: send the line "unsubscribe cpufreq" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html