Re: Postgresql 12, 512 partition by hash. Slow select

Arya F <arya6000@xxxxxxxxx> · Sun, 5 Apr 2020 15:50:18 -0400

On Sun, Apr 5, 2020 at 2:55 PM Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
>
> Arya F <arya6000@xxxxxxxxx> writes:
> > I have a table with 120 million rows of data spread among 512
> > partitioned by hash table. The id column of the table is a uuid, which
> > is what is being used for the partition hash and it's also the PK for
> > the table.
>
> > The table has a text column, which also has a btree index on it. A
> > select query on an identical non-partitioned table takes 0.144
> > seconds, but on the partitioned table it takes 5.689 seconds.
>
> > Am I missing something in my setup? Or is this expected? I do know
> > having more than 100 partitions in prior versions of PostgreSQL 12
> > would cause a major slow down, but from what I read PostgreSQL 12
> > addresses that now?
>
> You have your expectations calibrated wrongly, I suspect.
>
> Your default expectation with a table with many partitions should be
> that queries will have to hit all those partitions and it will take a
> long time.  If the query is such that the system can prove that it
> only needs to access one partition, then it can be fast --- but those
> proof rules are not superlatively bright, and they're especially not
> bright for hash partitioning since that has so little relationship
> to WHERE restrictions that practical queries would use.  But if the
> query WHERE isn't restricting the partitioning key at all, as I suspect
> is the case for your query, then there's certainly no chance of not
> having to search all the partitions.
>
> If you showed us the specific table declaration and query you're
> working with, it might be possible to offer more than generalities.
>
> In general though, partitioning should be a last resort when you've
> got so much data that you have no other choice.  I doubt that you
> are there at all with 100M rows, and you are certainly not at a point
> where using hundreds of partitions is a good idea.  They are not
> cost-free, by a very long shot.  And when you do partition, you
> typically need to think hard about what the partitioning rule will be.
> I'm afraid that hash partitioning is more of a shiny trap for novices
> than it is a useful tool, because it doesn't organize the data into
> meaningful sub-groups.
>
>                         regards, tom lane

The table at some point will have more than 1 billion rows, the
information stored is international residential addresses. Trying to
figure out a way of spreading the data fairly evenly thought out
multiple partitions, but I was unable to come up with a way of
splitting the data so that Postgres does not have to search though all
the partitions.