parallelizing slow queries for multiple cores (PostgreSQL + Gearman)

Marinos Yannikos <mjy@xxxxxxxxxxx> · Tue, 17 Mar 2009 22:57:44 +0100

We have a few slow queries that use sequential scans on tables that have 
plenty of indexes (for other queries), on a box with a lot of RAM and 13 
active cores (don't ask), so I was curious to find out how to put this 
environment to better use. The result is (maybe) interesting, esp. since 
PostgreSQL is getting better at executing many queries in parallel 
lately and we will have more than 16 cores in typical servers very soon.

The simplified scenario was a query like

select * from t where foo ~ 'bla';

on a table with approx. 9m rows, taking around 12 seconds in the best 
case. The table had a bigserial primary key "eid" with btree index, 
which seemed to be the most suitable starting point.

The current value range of eid was partitioned into intervals of equal 
size and depending on the number of rows in these intervals, one or more 
of them were assigned to worker processes (this is done once per day, 
not before each query!):

worker 1: select * from t where foo ~ 'bla' and (eid >= 300000 and eid < 
400000)
worker 2: select * from t where foo ~ 'bla' and (eid >= 500000 and eid < 
600000 or eid >= 1100000 and eid < 1200000 ...)
...

Instead of a sequential scan, a bunch of worker processes (implemented 
with Gearman/Perl) would then execute queries (one each) with a plan like:
 Bitmap Heap Scan on t ...
  Recheck Cond: eid >= 300000 and eid < 400000 ..
  Filter: foo ~ 'bla'
    Bitmap Index Scan on t_pkey ...

This led to a speedup factor of 2 when the box was idle, i.e. the time 
to split the query, distribute the jobs to worker processes, execute the 
queries in parallel, collect results and send them back to the client 
was now ~6 seconds.

Observations:
- currently, the speedup is almost the same for anything between ~10 and 
 80+ workers (~12s down to ~7s on average, best run ~ 6s)
- the effective processing speed of the workers varied greatly (fastest 
~3x to ~10x the rows/second of the slowest - real time divided by rows 
to work on)
- the fastest workers went as fast as the sequential scans (in rows per 
second) - sometimes, but not always (most likely they were actually 
running alone then for some reason)
- in each new run, the workers finished in a completely different order 
(even though they had the same parts of the table to work on and thus 
identical queries) so perhaps the partitioning of the work load is quite 
good already and it's more of a scheduling issue (Gearman? Shared buffer 
contention?)
- the Linux I/O scheduler had a visible effect, "noop" was better than 
"deadline" (others not tried yet) ~10%, but this is typical for random 
writes and RAID controllers that manage their writeback cache as they 
like (it's a wasted effort to reorder writes before hitting the RAID 
controller)
- CLUSTER might help a lot (the workers should hit fewer pages and need 
fewer shared resources?) but I haven't tested it
- our query performance is not limited by disk I/O (as is usually the 
case I guess), since we have most of the tables/indexes in RAM. Whether 
it scales as well (or better?) with a proper disk subsystem and less 
RAM, is unknown.

I hope there is some room for improvement so these queries can execute 
faster in parallel for better scaling, these first results are quite 
encouraging. I'd love to put 32+ cores to use for single queries. 
Perhaps something like this could be built into PostgreSQL at some 
point? There's no complicated multithreading/locking involved and 
Postgres has enough statistics available to distribute work even better. 
It should be easy to implement this for any of the various connection 
pooling solutions also.

Has anyone done similar work in the light of upcoming many-core 
CPUs/systems? Any better results than 2x improvement?

Apologies if this is a well-known and widely used technique already. ;-)

Marinos.

PS. yes, for the example query we could use tsearch2 etc., but it has 
drawbacks in our specific case (indexing overhead, no real regexps 
possible) and it's only an example anyway ...

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance