On Thu, Feb 2, 2023 at 1:54 PM Alex Kaiser <alextkaiser@xxxxxxxxx> wrote: > Thanks for the explanation. Yes, that is the query plan I was imagining. I do see how chopping it up could result in an unfair distribution. But my counter to that would be that wouldn't chopping it up still be better than not. If things do happen to work out to be fair, now it's X times as fast, if things are very unfair, then you haven't really lost much (besides the parallel overhead) compared to the non-parallel query. Or maybe it should be possible to do the parallel query if there were some statistics (either normal ones or extended ones) that told the planner that the result would probably be fair? Maybe, but unfairness multiplies if it's part of a larger plan; what if the output of those nodes is the input to much more work, but now THAT work is being done by one process? But yeah, statistics could help with that. I'm vaguely aware that other systems that do more partition-based parallelism spend a lot of effort on that sort of thinking. > Though I do agree that the "work stealing" option would be the most efficient, but would be a lot more complicated to code up. Yeah. I probably used the wrong word; what I was describing is (something like) page-based parallelism, where input gets chopped up into arbitrary chunks and handed out to consumers on demand, but we don't know anything about the values in those chunks; that allows for many interesting kind of plans, and it's nice because it's fair. Another kind of parallelism is partition-based, which PostgreSQL can do in a limited sense: we can send workers into different partitions of a table (what we can't do is partition the table on-the-fly, which is central to most parallelism in some other systems). Let's see: CREATE TABLE testing( id INT, info INT, data_one TEXT, data_two TEXT, primary key(id, info) ) partition by hash (id); create table testing_p0 partition of testing for values with (modulus 2, remainder 0); create table testing_p1 partition of testing for values with (modulus 2, remainder 1); INSERT INTO testing(id, info, data_one, data_two) SELECT idx, idx, md5(random()::text), md5(random()::text) FROM generate_series(1,10000000) idx; analyze; explain select count(*) from testing where id in (1608377,5449811,5334677,5458230,2053195,3572313,1949724,3559988,5061560,8479775, ...); Aggregate -> Append -> Index Only Scan using testing_p0_pkey on testing_p0 testing_1 -> Index Only Scan using testing_p1_pkey on testing_p1 testing_2 Hmph. I can't seem to convince it to use Parallel Append. I think it might be because the planner is not smart enough to chop down the =ANY lists to match the partitions. One sec... Ok I hacked my copy of PostgreSQL to let me set parallel_setup_costs to negative numbers, and then I told it that parallelism is so awesome that it makes your queries cost -1000000 timerons before they even start. Now I see a plan like: Gather (cost=-999999.57..-987689.45 rows=2000 width=74) Workers Planned: 2 -> Parallel Append (cost=0.43..12110.55 rows=832 width=74) -> Parallel Index Scan using testing_p0_pkey on testing_p0 testing_1 -> Parallel Index Scan using testing_p1_pkey on testing_p1 testing_2 But it's probing every index for every one of the values in the big list, not just the ones that have a non-zero chance of finding a match, which is a waste of cycles. I think if the planner were smarter about THAT (as it is for plain old "="), then the costing would have chosen parallelism naturally by cost. But it's probably not as cool as page-based parallelism, because parallelism is limited by your partitioning scheme. If I had more timerons myself, I'd like to try to make parallel function scans, or parallel CTE scans, work...