Re: Table partitioning problem

Samba GUEYE <samba.gueye@xxxxxxxxxxxx> · Tue, 15 Mar 2011 16:19:46 +0100

hi

Thanks again very much for these clear-cut answers

I think i'll try to implement the partitionning despite all the 
difficulties you raise about it in this thread
because i can't find any viable solution right now for this situation. 
It will constrain me to change the datamodel to workaround the 
inheritance foreigh key issue but i will at least test it because we 
have limited resources and can't afford to have many servers or whatever 
to boost performances...

Best Regards

Le 15/03/2011 14:18, Shaun Thomas a écrit :
On 03/15/2011 05:10 AM, Samba GUEYE wrote:

1. Why "... partitionning is not a good idea ..." like you said
Robert and Conor "... I grant that it would be better to never need
to do that" ?

There are a number of difficulties the planner has with partitioned 
tables. Only until very recently, MAX and MIN would scan every single 
partition, even if you performed the action on the constraint column. 
Basically quite a few queries will either not have an ideal execution 
plan, or act in unexpected manners unless you physically target the 
exact partition you want.

Even though we have several tables over the 50-million rows, I'm 
reluctant to partition them because we have a very 
transaction-intensive database, and can't risk the possible penalties.

2. Is there another way or strategy to deal with very large tables
(over 100 000 000 rows per year in one table) beyond indexing and
partitionning?

What you have is a very good candidate for partitioning... if you can 
effectively guarantee a column to partition the data on. If you're 
getting 100M rows per year, I could easily see some kind of 
created_date column and then having one partition per month.

One of the things we hate most about very large tables is the amount 
of time necessary to vacuum or index them. CPU and disk IO can only go 
so fast, so eventually you encounter a point where it can take hours 
to index a single column. If you let your table get too big, your 
maintenance costs will be prohibitive, and partitioning may be 
required at that point.

As an example, we have a table that was over 100M rows and we have 
enough memory that the entire table was in system cache. Even so, 
rebuilding the indexes on that table required about an hour and ten 
minutes *per index*. We knew this would happen and ran the reindex in 
parallel, which we confirmed by watching five of our CPUs sit at 99% 
utilization for the whole interval.

That wouldn't have happened if the table were partitioned.

3. If you had to quantify a limit of numbers of rows per table in a
single postgresql database server what would you say?

I'd personally avoid having any tables over 10-million rows. We have 
quad Xeon E7450's, tons of ram, and even NVRAM PCI cards to reduce IO 
contention, and still, large tables are a nuisance. Even the best CPU 
will balk at processing 10-million rows quickly.

And yes. Good queries and design will help. Always limiting result 
sets will help. Efficient, highly selective indexes will help. But 
maintenance grows linearly, despite our best efforts. The only way to 
sidestep that issue is to partition tables or rewrite your application 
to scale horizontally via data sharding or some other shared-nothing 
cluster with plProxy, GridSQL or PGPool.

You'll have this problem with any modern database. Big tables are a 
pain in everybody's asses.

It's too bad PostgreSQL can't assign one thread per data-file and 
merge the results.

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance