Hi, "David Bonnell" <dave@xxxxxxxxxxxxxxx> writes: > It sounds like the granularity of parallelism is too fine. That is, > each "task" is too short and the overhead of task dispatching (your > task queue processing, the kernels thread context switching, any IPC > required, etc.) is longer then the duration of a single task. > > I hit the same problem a decade and a half ago when I worked on > distributed parallel ray tracing systems for my post graduate > thesis. If each task is a pixel then you may want to consider > increasing this to a (configurable size) bundle of pixels. > Depending on the algorithm being parallelized the bundle may contain > contiguous pixels (if processing of each pixel requires > approximately uniform processor time) or a random set of pixels (if > there is, or can potentially be, significant variance in per-pixel > processing time). The task is not a single pixel but a single tile (that is usually a region of 64x64 pixels). GIMP processes pixel regions by iterating over the tiles. The multi-threaded pixel processor uses a configurable number of threads. Each thread obtains a lock on the pixel-region, takes a pointer to the next tile from the queue, releases the lock, processes the tile and starts over. This goes on until all tiles are processed. The main threads blocks until the queue is empty and all threads have finished their jobs. If a progress callback has been specified, the main thread wakes up regularily and updates the progress bar. Sven