On Thu, Jul 10, 2014 at 8:20 AM, Bill Moran <wmoran@xxxxxxxxxxxxxxxxx> wrote:
On Thu, 10 Jul 2014 07:59:20 -0700 (PDT) AlexK <alkuzo@xxxxxxxxx> wrote:
> My table currently uses up 62 GB of storage, and it has 450 M rows. This
> narrow table has a PK on (ParentID, ChildNumber), and it has between 20K and
> 50K of child rows per parent.
>
> The data is inserted daily, rarely modified, never deleted. The performance
> of modifications is not an issue. The only select from it is as follows:
>
> SELECT <column_lis> FROM MyChildTable WHERE ParentID=?
> ORDER BY ChildNumber;
>
> The selects are frequent, and their performance is essential.
How is their performance currently?
>> Would you advise me to partition this table?
No, not based on the current info. There is no reason to think partitioning would improve the performance that matters to you. You said that child rows for the same parent are all inserted at the same time, so they should naturally be well-clustered. That will be important for performance once the data exceeds what can be cached. If that clustering did not occur naturally then you might benefit from imposing it, and partitioning might be an import part of doing that. But it sounds like you will not need to worry about that.
In general, yes, given the information you provided. A parition on
ParentID % $something should improve performance.
PostgresSQL's constraint exclusion logic is not smart enough to turn a simple equality into a mod equality. Which means every select query would have to include "AND ParentID % something = (:1 % somthing)" in addition to the primary clause "ParentID=:1", in order to benefit from constraint exclusion. That would be very unnatural, annoying, and error prone. Range partitioning would be better, if any partitioning is needed at all.
Cheers,
Jeff