Query Planner not taking advantage of HASH PARTITION

Benjamin Tingle <ben@xxxxxxxxxx> · Thu, 14 Apr 2022 16:36:42 -0700

Greetings Postgres Developers,
I've recently started taking advantage of the PARTITION BY HASH feature for my database system. It's a really great fit since my tables can get quite large (900M+ rows for some) and splitting them up into manageable chunks should let me upload to them without having to update an enormous index every time. What's more, since each partition has a write lock independent of the parent table, it should theoretically be possible to perform a parallelized insert operation, provided the data to be added is partitioned beforehand.

What has been disappointing is that the query planner doesn't seem to recognize this potential. For example, if I have a large list of input data, and I want to perform a select operation across the target table:

  -- target table is hashed on 'textfield' & has a unique index on 'textfield'
  select * from temp_data td left join target tg on td.textfield = tg.textfield;

I would expect to get a query plan like this:

  partition temp_data
  parallel scan on
    target_p0 using target_p0_textfield_uniq_idx against temp_data_p0
    target_p1 using target_p1_textfield_uniq_idx against temp_data_p1
    target_p2 using target_p2_textfield_uniq_idx against temp_data_p2
    ...

Instead, I get a seemingly terrible plan like this:

  hash temp_data
  sequential scan on
    target_p0 against temp_data
    target_p1 against temp_data
    target_p2 against temp_data
    ...

It doesn't even make use of the index on the textfield! Instead, it opts to hash all of temp_data and perform a sequential scan against it.

It doesn't help if I partition temp_data by textfield beforehand either (using the same scheme as the target table). It still opts to concatenate all of temp_data, hash it, then perform a sequential scan against the target partitions.

On insert the behaviour is better but it still opts for a sequential insert instead of a parallel one. 

Does the query planner know something I don't? It's my intuition that it should be faster to do a rough counting sort (partition by hash) first, and then do N smaller more accurate sorts in parallel afterwards.

Currently I am creating a custom script(s) to emulate my desired behaviour, but it would be nice if there was a way to get the query planner to do this automatically. Any tricks to do this would be much appreciated!

-Ben