Re: Table partitioning problem

Shaun Thomas <sthomas@xxxxxxxxx> · Tue, 15 Mar 2011 08:18:45 -0500

On 03/15/2011 05:10 AM, Samba GUEYE wrote:

1. Why "... partitionning is not a good idea ..." like you said
Robert and Conor "... I grant that it would be better to never need
to do that" ?

There are a number of difficulties the planner has with partitioned 
tables. Only until very recently, MAX and MIN would scan every single 
partition, even if you performed the action on the constraint column. 
Basically quite a few queries will either not have an ideal execution 
plan, or act in unexpected manners unless you physically target the 
exact partition you want.

Even though we have several tables over the 50-million rows, I'm 
reluctant to partition them because we have a very transaction-intensive 
database, and can't risk the possible penalties.

2. Is there another way or strategy to deal with very large tables
(over 100 000 000 rows per year in one table) beyond indexing and
partitionning?

What you have is a very good candidate for partitioning... if you can 
effectively guarantee a column to partition the data on. If you're 
getting 100M rows per year, I could easily see some kind of created_date 
column and then having one partition per month.

One of the things we hate most about very large tables is the amount of 
time necessary to vacuum or index them. CPU and disk IO can only go so 
fast, so eventually you encounter a point where it can take hours to 
index a single column. If you let your table get too big, your 
maintenance costs will be prohibitive, and partitioning may be required 
at that point.

As an example, we have a table that was over 100M rows and we have 
enough memory that the entire table was in system cache. Even so, 
rebuilding the indexes on that table required about an hour and ten 
minutes *per index*. We knew this would happen and ran the reindex in 
parallel, which we confirmed by watching five of our CPUs sit at 99% 
utilization for the whole interval.

That wouldn't have happened if the table were partitioned.

3. If you had to quantify a limit of numbers of rows per table in a
single postgresql database server what would you say?

I'd personally avoid having any tables over 10-million rows. We have 
quad Xeon E7450's, tons of ram, and even NVRAM PCI cards to reduce IO 
contention, and still, large tables are a nuisance. Even the best CPU 
will balk at processing 10-million rows quickly.

And yes. Good queries and design will help. Always limiting result sets 
will help. Efficient, highly selective indexes will help. But 
maintenance grows linearly, despite our best efforts. The only way to 
sidestep that issue is to partition tables or rewrite your application 
to scale horizontally via data sharding or some other shared-nothing 
cluster with plProxy, GridSQL or PGPool.

You'll have this problem with any modern database. Big tables are a pain 
in everybody's asses.

It's too bad PostgreSQL can't assign one thread per data-file and merge 
the results.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@xxxxxxxxx

______________________________________________

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance