It sounds like the granularity of parallelism is too fine. That is, each "task" is too short and the overhead of task dispatching (your task queue processing, the kernels thread context switching, any IPC required, etc.) is longer then the duration of a single task. I hit the same problem a decade and a half ago when I worked on distributed parallel ray tracing systems for my post graduate thesis. If each task is a pixel then you may want to consider increasing this to a (configurable size) bundle of pixels. Depending on the algorithm being parallelized the bundle may contain contiguous pixels (if processing of each pixel requires approximately uniform processor time) or a random set of pixels (if there is, or can potentially be, significant variance in per-pixel processing time). -Dave -----Original Message----- From: gimp-developer-bounces@xxxxxxxxxxxxxxxxxxxxxx [mailto:gimp-developer-bounces@xxxxxxxxxxxxxxxxxxxxxx] On Behalf Of Daniel Egger Sent: Monday, 21 February 2005 9:13 AM To: pcg@xxxxxxxx ( Marc) (A.) (Lehmann ) Cc: Sven Neumann; Developer gimp-devel Subject: Re: [Gimp-developer] GIMP and multiple processors On 20.02.2005, at 23:47, <pcg@xxxxxxxx ( Marc) (A.) (Lehmann )> wrote: > Linux will not keep two threads running on a single cpu if both are > ready > and nothing else is running, regardless of locality etc., as the kernel > lacks the tools to effectively decide wether threads should stay on a > cpu > or not. Yes and no. I just figured out that the tools I were looking for are called schedutils and can be used to change the affinity settings of a process, i.e. pin it to some CPU or allow it to migrate as the kernel decides between a set of CPUs. Forcing the NPTL implementation to degrade to legacy pthreads means that one thread equals one process and thus can be controlled with taskset. Oh yes, and I just noticed that now this isn't even necessary anymore because for some reason the kernel now migrates on of the pthread processes to the other CPU automatically after a short while of processing. > (I mean, it's of course bad to interlave operations on a per-pixel > basis > instead of e.g. a per-tile basis, but the kernel will run the threads > concurrently wether or not it gets slower). Certainly. Opterons are bandwidth monsters but this doesn't mean that they'll be forgiving to stupid algorithms. > That's quite possible, but IFF the kernel indeed keeps the two threads > on > a single cpu then it means that both aren't ready at the same time, > e.g. > due to lock contention or other things. I can force it to use both CPUs now, but even with 200% utilization it is 2s slower to run this stupid ubenchmark than on 1 CPU without threads. Servus, Daniel