Re: What's a reasonable maximum number for table partitions?

Bill Moran <wmoran@xxxxxxxxxxxxxxxxx> · Fri, 13 Feb 2015 17:56:31 -0500

On Sat, 14 Feb 2015 11:27:52 +1300
Tim Uckun <timuckun@xxxxxxxxx> wrote:

> This might get pretty crazy if I am doing queries like WHERE client_id in
> () or when I am trying to join some table with the client table.   Maybe I
> can precalculate the ids that are going to go into each partition and set
> the constraint as where client_id in (some_huge_list).

The general caveat is if client_id IN (some huge list) then the list is
probably going to include most or all of the partitions, PostgreSQL will
have to look at all of them anyway, so giving it the hint probably isn't
helpful anyway.

Obviously, there are corner cases to every generalization, but overall,
the research I did last year showed that partitioning was a big win when
frequent queries could narrow the number of partitions down to just a few,
and in those cases, constructing a WHERE clause that created the desired
effect was pretty easy.  In our case, since the case of selecting from
more than one table partition was unusual, we decided not to put the
effort into optimizing those queries.

This is why it's SO important to partition on the right column. Getting
the wrong column can have you doing all the work of partitioning with
none or few of the benefits. Think of it this way, if I have an orders
table, I can partition on order_id or customer_id or product_id. There
are probably other options, but let's just consider those for this
exercise.

If I partition on order_id, then the case of selecting a single order
when the order_id is known will be very fast, and queries that select
based on product or customer will benefit little if any at all.

If I partition on customer_id, then customers will be able to find all
of their orders very efficiently, but queries that look for all orders
of a specific product or searches for a specific order_id won't benefit
much, if any.

If I partition on product_id, then queries to find out who's buying a
particular product will be very efficient, but queries based on customer
or order_id won't benefit much, if any.

Which one you choose is dependent on your expected workload, and your
performance requirements. You may decide that customers need speed the
most, and that crunching data on products is something that it's OK
to be a little slower. You may find that the most frequent query is to
search by order_id, so overall your application will see the most
benefit by super-optimizing that request. There's no one answer that's
right for every situation ... at least not in my experience.

> On Sat, Feb 14, 2015 at 11:15 AM, Bill Moran <wmoran@xxxxxxxxxxxxxxxxx>
> wrote:
> 
> > On Sat, 14 Feb 2015 11:14:10 +1300
> > Tim Uckun <timuckun@xxxxxxxxx> wrote:
> >
> > > If I used modulo arithmetic how would the query optimizer know which
> > table
> > > to include and exclude? For example say I did modulo 100 based on the
> > field
> > > client_id.  I create a base table with the trigger to insert the data
> > into
> > > the proper child table. Each table has the constraint (client_id % 100)
> > = X
> > >
> > > So if I do select from base table where client_id = 10  would postgres
> > know
> > > to only select from client_table_10? Normally I would always have a
> > > client_id in my queries so hopefully the this could be very efficient.
> >
> > Unless the newest versions of PostgreSQL has improved on this, you have to
> > give the planner just a bit of a hint ... you're query should look like:
> >
> > SELET ... WHERE client_id = 10 AND client_id % 100 = 10;
> >
> > The part after the AND looks silly and redundant, but it guarantees that
> > the planner will consider the partition layout when it plans the query,
> > and in every test that I've run the result will be that the planner only
> > looks at the one child table.
> >
> > > On Sat, Feb 14, 2015 at 5:12 AM, Vick Khera <vivek@xxxxxxxxx> wrote:
> > >
> > > >
> > > > On Thu, Feb 12, 2015 at 7:44 PM, Tim Uckun <timuckun@xxxxxxxxx> wrote:
> > > >
> > > >> Does anybody have experience with huge number of partitions if so
> > where
> > > >> did you start running into trouble?
> > > >>
> > > >
> > > > I use an arbitrary 100-way split for a lot of tracking info. Just
> > modulo
> > > > 100 on the ID column. I've never had any issues with that. If you can
> > > > adjust your queries to pick the right partition ahead of time, which I
> > am
> > > > able to do for many queries, the number of partitions shouldn't matter
> > > > much. Only rarely do I need to query the primary table.
> > > >
> > > > I don't think your plan for 365 partitions is outrageous on modern
> > large
> > > > hardware. For 1000 partitions, I don't know. It will depend on how you
> > can
> > > > optimize your queries before giving them to postgres.
> > > >
> >
> >
> > --
> > Bill Moran
> >

-- 
Bill Moran

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general