Re: Getting an index scan to be a parallel index scan

Thomas Munro <thomas.munro@xxxxxxxxx> · Thu, 2 Feb 2023 14:48:14 +1300

On Thu, Feb 2, 2023 at 1:54 PM Alex Kaiser <alextkaiser@xxxxxxxxx> wrote:
> Thanks for the explanation. Yes, that is the query plan I was imagining. I do see how chopping it up could result in an unfair distribution. But my counter to that would be that wouldn't chopping it up still be better than not. If things do happen to work out to be fair, now it's X times as fast, if things are very unfair, then you haven't really lost much (besides the parallel overhead) compared to the non-parallel query. Or maybe it should be possible to do the parallel query if there were some statistics (either normal ones or extended ones) that told the planner that the result would probably be fair?

Maybe, but unfairness multiplies if it's part of a larger plan; what
if the output of those nodes is the input to much more work, but now
THAT work is being done by one process?  But yeah, statistics could
help with that.  I'm vaguely aware that other systems that do more
partition-based parallelism spend a lot of effort on that sort of
thinking.

> Though I do agree that the "work stealing" option would be the most efficient, but would be a lot more complicated to code up.

Yeah.  I probably used the wrong word; what I was describing is
(something like) page-based parallelism, where input gets chopped up
into arbitrary chunks and handed out to consumers on demand, but we
don't know anything about the values in those chunks; that allows for
many interesting kind of plans, and it's nice because it's fair.

Another kind of parallelism is partition-based, which PostgreSQL can
do in a limited sense: we can send workers into different partitions
of a table (what we can't do is partition the table on-the-fly, which
is central to most parallelism in some other systems).  Let's see:

CREATE TABLE testing(
   id INT,
   info INT,
   data_one TEXT,
   data_two TEXT,
   primary key(id, info)
) partition by hash (id);
create table testing_p0 partition of testing for values with (modulus
2, remainder 0);
create table testing_p1 partition of testing for values with (modulus
2, remainder 1);
INSERT INTO testing(id, info, data_one, data_two)
SELECT idx, idx, md5(random()::text), md5(random()::text)
FROM generate_series(1,10000000) idx;
analyze;

explain select count(*) from testing where id in
(1608377,5449811,5334677,5458230,2053195,3572313,1949724,3559988,5061560,8479775,
...);

 Aggregate
   ->  Append
         ->  Index Only Scan using testing_p0_pkey on testing_p0 testing_1
         ->  Index Only Scan using testing_p1_pkey on testing_p1 testing_2

Hmph.  I can't seem to convince it to use Parallel Append.  I think it
might be because the planner is not smart enough to chop down the =ANY
lists to match the partitions.  One sec...

Ok I hacked my copy of PostgreSQL to let me set parallel_setup_costs
to negative numbers, and then I told it that parallelism is so awesome
that it makes your queries cost -1000000 timerons before they even
start.  Now I see a plan like:

 Gather  (cost=-999999.57..-987689.45 rows=2000 width=74)
   Workers Planned: 2
   ->  Parallel Append  (cost=0.43..12110.55 rows=832 width=74)
         ->  Parallel Index Scan using testing_p0_pkey on testing_p0 testing_1
         ->  Parallel Index Scan using testing_p1_pkey on testing_p1 testing_2

But it's probing every index for every one of the values in the big
list, not just the ones that have a non-zero chance of finding a
match, which is a waste of cycles.  I think if the planner were
smarter about THAT (as it is for plain old "="), then the costing
would have chosen parallelism naturally by cost.

But it's probably not as cool as page-based parallelism, because
parallelism is limited by your partitioning scheme.

If I had more timerons myself, I'd like to try to make parallel
function scans, or parallel CTE scans, work...