Re: Are there any options to parallelize queries?

Craig Ringer <ringerc@xxxxxxxxxxxxx> · Wed, 22 Aug 2012 11:24:38 +0800

On 08/21/2012 04:45 PM, Seref Arikan wrote:

Parallel software frameworks such as Erlang's OTP or Scala's Akka do
help a lot, but it would be a lot better if I could feed those
frameworks with data faster. So, what options do I have to execute
queries in parallel, assuming a transactional system running on
postgresql?

AFAIK Native support for parallelisation of query execution is currently 
almost non-existent in Pg. You generally have to break your queries up 
into smaller queries that do part of the work, run them in parallel, and 
integrate the results together client-side.

There are some tools that can help with this. For example, I think 
PgPool-II has some parallelisation features, though I've never used 
them. Discussion I've seen on this list suggests that many people handle 
it in their code directly.

Note that Pg is *very* good at concurently running many queries, with 
features like synchronized scans. The whole DB is written around fast 
concurrent execution of queries, and it'll happily use every CPU and I/O 
resource you have. However, individual queries cannot use multiple CPUs 
or I/O "threads", you need many queries running in parallel to use the 
hardware's resources fully.

As far as I know the only native in-query parallelisation Pg offers is 
via effective_io_concurrency, and currently that only affects bitmap 
heap scans:

    http://archives.postgresql.org/pgsql-general/2009-10/msg00671.php

... not seqscans or other access methods.

Execution of each query is done with a single process running a single 
thread, so there's no CPU parallelism except where the compiler can 
introduce some behind the scenes - which isn't much. I/O isn't 
parallelised across invocations of nested loops, by splitting seqscans 
up into chunks, etc either.

There are some upsides to this limitation, though:

- The Pg code is easier to understand, maintain, and fix

- It's easier to add features

- It's easier to get right, so it's less buggy and more
  reliable.

As the world goes more and more parallel Pg is likely to follow at some 
point, but it's going to be a mammoth job. I don't see anyone 
volunteering the many months of their free time required, there's nobody 
being funded to work on it, and I don't see any of the commercial Pg 
forks that've added parallel features trying to merge their work back 
into mainline.

If you have a commercial need, perhaps you can find time to fund work on 
something that'd help you out, like honouring effective_io_concurrency 
for sequential scans?

--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general