Re: Parallel sequential scans

Tom Lane <tgl@xxxxxxxxxxxxx> · Fri, 24 Mar 2006 01:44:19 -0500

Steve Atkins <steve@xxxxxxxxxxx> writes:
> I'm doing some reporting-type work with PG, with the vast
> majority of queries hitting upwards of 25% of the table, so
> being executed as seq scans.
> ...
> It would be really nice to be able to do all the work with a
> single pass over the table, executing all the queries in
> parallel in that pass. They're pretty simple queries, mostly,
> just some aggregates and a simple where clause.

> There are some fairly obvious ways to merge multiple
> queries to do that at a SQL level - converting each query
> into a function and passing each row from a select * to
> each of the functions would be one of the less ugly.

> Or I could fire off all the queries simultaneously and hope
> they stay in close-enough lockstep through a single pass
> through the table to be able to share most of the IO.

I have not tried this sort of thing, but right offhand I like the second
alternative.  The "hope" is more well-founded than you seem to think:
whichever process is currently ahead will be slowed by requesting I/O,
while processes that are behind will find the pages they need already in
shared buffers.  You should definitely see just one read of each table
page as the parallel scans advance, assuming you don't have an
unreasonably small number of buffers.

Another reason, if you have more than one CPU in your machine, is that
multiple processes can make use of multiple CPUs, whereas the
one-fancy-query approach doesn't parallelize (at least not without
Bizgres or some such).

And lastly, you can just try it without sweating hard to convert the
queries ;-).  So try it and let us know how it goes.

			regards, tom lane