Re: partitioning for speed, but query planner ignores

Kevin Grittner <kgrittn@xxxxxxxxx> · Wed, 2 Oct 2013 08:19:58 -0700 (PDT)

David Rysdam <drysdam@xxxxxxxxxx> wrote:

> We have a by-our-standards large table (about 40e6 rows). Since it is
> the bottleneck in some places, I thought I'd experiment with
> partitioning.

In my personal experience I have gone into hundreds of millions of
rows with good performance without partitioning.  It's all about
designing good indexes for the workload.

I have only seen partitioning help in two cases:
(1)  There will be bulk deletes of rows, and you know at insert
time which bulk delete the row belongs with.  Dropping a partition
table is a very fast way to delete a large number of rows.
(2)  The bulk of activity will be on a relatively small subset of
the rows at any one time, and you can partition such that the set
of active rows will be in a small number of partitions.

In all other cases, I have only seen partitioning harm performance.
 There is no reason to think that checking the table-level
constraints on every partition table will be faster than descending
through an index tree level.

> The table holds data about certain objects, each of which has an object
> number and some number of historical entries (like account activity at a
> bank, say). The typical usage pattern is: relatively rare inserts that
> happen in the background via an automated process (meaning I don't care
> if they take a little longer) and frequent querying, including some
> where a human is sitting in front of it (i.e. I'd like it to be a lot
> faster).
>
> Our most frequent queries either select "all history for object N" or
> "most recent item for some subset of objects".
>
> Because object number figure so prominently, I thought I'd partition on
> that. To me, it makes the most sense from a load-balancing perspective

Load balancing?  Hitting a single partition more heavily improves
your cache hit ratio.  What sort of benefit are you expecting from
spreading the reads across all the partitions?  *Maybe* that could
help if you carefully placed each partition table on a separate set
of spindles, but usually you are better off having one big RAID so
that every partition is spread across all the spindles
automatically.

> Lower numbers are going to be queried much less often than higher
> numbers.

This suggests to me that you *might* get a performance boost if you
define partitions on object number *ranges*.  It still seems a bit
dubious, but it has a chance.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general