On 03/15/2011 05:10 AM, Samba GUEYE wrote:
1. Why "... partitionning is not a good idea ..." like you said Robert and Conor "... I grant that it would be better to never need to do that" ?
There are a number of difficulties the planner has with partitioned tables. Only until very recently, MAX and MIN would scan every single partition, even if you performed the action on the constraint column. Basically quite a few queries will either not have an ideal execution plan, or act in unexpected manners unless you physically target the exact partition you want.
Even though we have several tables over the 50-million rows, I'm reluctant to partition them because we have a very transaction-intensive database, and can't risk the possible penalties.
2. Is there another way or strategy to deal with very large tables (over 100 000 000 rows per year in one table) beyond indexing and partitionning?
What you have is a very good candidate for partitioning... if you can effectively guarantee a column to partition the data on. If you're getting 100M rows per year, I could easily see some kind of created_date column and then having one partition per month.
One of the things we hate most about very large tables is the amount of time necessary to vacuum or index them. CPU and disk IO can only go so fast, so eventually you encounter a point where it can take hours to index a single column. If you let your table get too big, your maintenance costs will be prohibitive, and partitioning may be required at that point.
As an example, we have a table that was over 100M rows and we have enough memory that the entire table was in system cache. Even so, rebuilding the indexes on that table required about an hour and ten minutes *per index*. We knew this would happen and ran the reindex in parallel, which we confirmed by watching five of our CPUs sit at 99% utilization for the whole interval.
That wouldn't have happened if the table were partitioned.
3. If you had to quantify a limit of numbers of rows per table in a single postgresql database server what would you say?
I'd personally avoid having any tables over 10-million rows. We have quad Xeon E7450's, tons of ram, and even NVRAM PCI cards to reduce IO contention, and still, large tables are a nuisance. Even the best CPU will balk at processing 10-million rows quickly.
And yes. Good queries and design will help. Always limiting result sets will help. Efficient, highly selective indexes will help. But maintenance grows linearly, despite our best efforts. The only way to sidestep that issue is to partition tables or rewrite your application to scale horizontally via data sharding or some other shared-nothing cluster with plProxy, GridSQL or PGPool.
You'll have this problem with any modern database. Big tables are a pain in everybody's asses.
It's too bad PostgreSQL can't assign one thread per data-file and merge the results.
-- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 sthomas@xxxxxxxxx ______________________________________________ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance