parallelizing slow queries for multiple cores (PostgreSQL + Gearman)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We have a few slow queries that use sequential scans on tables that have plenty of indexes (for other queries), on a box with a lot of RAM and 13 active cores (don't ask), so I was curious to find out how to put this environment to better use. The result is (maybe) interesting, esp. since PostgreSQL is getting better at executing many queries in parallel lately and we will have more than 16 cores in typical servers very soon.

The simplified scenario was a query like

select * from t where foo ~ 'bla';

on a table with approx. 9m rows, taking around 12 seconds in the best case. The table had a bigserial primary key "eid" with btree index, which seemed to be the most suitable starting point.

The current value range of eid was partitioned into intervals of equal size and depending on the number of rows in these intervals, one or more of them were assigned to worker processes (this is done once per day, not before each query!):

worker 1: select * from t where foo ~ 'bla' and (eid >= 300000 and eid < 400000) worker 2: select * from t where foo ~ 'bla' and (eid >= 500000 and eid < 600000 or eid >= 1100000 and eid < 1200000 ...)
...

Instead of a sequential scan, a bunch of worker processes (implemented with Gearman/Perl) would then execute queries (one each) with a plan like:
 Bitmap Heap Scan on t ...
  Recheck Cond: eid >= 300000 and eid < 400000 ..
  Filter: foo ~ 'bla'
    Bitmap Index Scan on t_pkey ...

This led to a speedup factor of 2 when the box was idle, i.e. the time to split the query, distribute the jobs to worker processes, execute the queries in parallel, collect results and send them back to the client was now ~6 seconds.

Observations:
- currently, the speedup is almost the same for anything between ~10 and 80+ workers (~12s down to ~7s on average, best run ~ 6s) - the effective processing speed of the workers varied greatly (fastest ~3x to ~10x the rows/second of the slowest - real time divided by rows to work on) - the fastest workers went as fast as the sequential scans (in rows per second) - sometimes, but not always (most likely they were actually running alone then for some reason) - in each new run, the workers finished in a completely different order (even though they had the same parts of the table to work on and thus identical queries) so perhaps the partitioning of the work load is quite good already and it's more of a scheduling issue (Gearman? Shared buffer contention?) - the Linux I/O scheduler had a visible effect, "noop" was better than "deadline" (others not tried yet) ~10%, but this is typical for random writes and RAID controllers that manage their writeback cache as they like (it's a wasted effort to reorder writes before hitting the RAID controller) - CLUSTER might help a lot (the workers should hit fewer pages and need fewer shared resources?) but I haven't tested it - our query performance is not limited by disk I/O (as is usually the case I guess), since we have most of the tables/indexes in RAM. Whether it scales as well (or better?) with a proper disk subsystem and less RAM, is unknown.

I hope there is some room for improvement so these queries can execute faster in parallel for better scaling, these first results are quite encouraging. I'd love to put 32+ cores to use for single queries. Perhaps something like this could be built into PostgreSQL at some point? There's no complicated multithreading/locking involved and Postgres has enough statistics available to distribute work even better. It should be easy to implement this for any of the various connection pooling solutions also.

Has anyone done similar work in the light of upcoming many-core CPUs/systems? Any better results than 2x improvement?

Apologies if this is a well-known and widely used technique already. ;-)

Marinos.

PS. yes, for the example query we could use tsearch2 etc., but it has drawbacks in our specific case (indexing overhead, no real regexps possible) and it's only an example anyway ...

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux