Search Postgresql Archives

Re: Should I partition this table?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 10 Jul 2014 07:59:20 -0700 (PDT) AlexK <alkuzo@xxxxxxxxx> wrote:

> My table currently uses up 62 GB of storage, and it has 450 M rows. This
> narrow table has a PK on (ParentID, ChildNumber), and it has between 20K and
> 50K of child rows per parent.
> 
> The data is inserted daily, rarely modified, never deleted. The performance
> of modifications is not an issue. The only select from it is as follows:
> 
> SELECT <column_lis> FROM MyChildTable WHERE ParentID=?
> ORDER BY ChildNumber;
> 
> The selects are frequent, and their performance is essential.
> 
> Would you advise me to partition this table?

In general, yes, given the information you provided.  A parition on
ParentID % $something should improve performance.  Exactly what
$something is will take some experimenting on your part to determine.
In my experience, the scenario you describe is likely to see SELECT
performance improve nearly linerally to the number of partitions, up
to some point that will take exerpimenting to determine.

However, there may other methods of partition and/or rearranging the
data that would be even better, depending on a lot of information you
did not provide.

As an example, how are your selects distributed?  Are they fairly even
across the entire data set?  Or do ParentID become less accessed the
older they get?  If the latter, you'll probably be better served by
lazily archiving infrequently accessed ParentID rows to an archive
table and adjusting the application to search that table only if the
rows weren't found in the primary table.  This is an improvement over
modulous partitioning becuase it's more likely that the frequently
accessed data will be in memory and stay there.

Going even further, since the data is infrequently modified, you might be
better served by putting some sort of cache (memcache, or a custom in-
app cache) in front of the DB and checking it first.  The in-app cache
is always the best because it incurs no network traffic to access, but
the feasibility of doing that depends on the exact nature of the
application.  Say it with me: An RDBMS is not RAM, and trying to use
it like RAM will probably lead to disappointing performance.

-- 
Bill Moran <wmoran@xxxxxxxxxxxxxxxxx>



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux