hi
Thanks again very much for these clear-cut answers
I think i'll try to implement the partitionning despite all the
difficulties you raise about it in this thread
because i can't find any viable solution right now for this situation.
It will constrain me to change the datamodel to workaround the
inheritance foreigh key issue but i will at least test it because we
have limited resources and can't afford to have many servers or whatever
to boost performances...
Best Regards
Le 15/03/2011 14:18, Shaun Thomas a écrit :
On 03/15/2011 05:10 AM, Samba GUEYE wrote:
1. Why "... partitionning is not a good idea ..." like you said
Robert and Conor "... I grant that it would be better to never need
to do that" ?
There are a number of difficulties the planner has with partitioned
tables. Only until very recently, MAX and MIN would scan every single
partition, even if you performed the action on the constraint column.
Basically quite a few queries will either not have an ideal execution
plan, or act in unexpected manners unless you physically target the
exact partition you want.
Even though we have several tables over the 50-million rows, I'm
reluctant to partition them because we have a very
transaction-intensive database, and can't risk the possible penalties.
2. Is there another way or strategy to deal with very large tables
(over 100 000 000 rows per year in one table) beyond indexing and
partitionning?
What you have is a very good candidate for partitioning... if you can
effectively guarantee a column to partition the data on. If you're
getting 100M rows per year, I could easily see some kind of
created_date column and then having one partition per month.
One of the things we hate most about very large tables is the amount
of time necessary to vacuum or index them. CPU and disk IO can only go
so fast, so eventually you encounter a point where it can take hours
to index a single column. If you let your table get too big, your
maintenance costs will be prohibitive, and partitioning may be
required at that point.
As an example, we have a table that was over 100M rows and we have
enough memory that the entire table was in system cache. Even so,
rebuilding the indexes on that table required about an hour and ten
minutes *per index*. We knew this would happen and ran the reindex in
parallel, which we confirmed by watching five of our CPUs sit at 99%
utilization for the whole interval.
That wouldn't have happened if the table were partitioned.
3. If you had to quantify a limit of numbers of rows per table in a
single postgresql database server what would you say?
I'd personally avoid having any tables over 10-million rows. We have
quad Xeon E7450's, tons of ram, and even NVRAM PCI cards to reduce IO
contention, and still, large tables are a nuisance. Even the best CPU
will balk at processing 10-million rows quickly.
And yes. Good queries and design will help. Always limiting result
sets will help. Efficient, highly selective indexes will help. But
maintenance grows linearly, despite our best efforts. The only way to
sidestep that issue is to partition tables or rewrite your application
to scale horizontally via data sharding or some other shared-nothing
cluster with plProxy, GridSQL or PGPool.
You'll have this problem with any modern database. Big tables are a
pain in everybody's asses.
It's too bad PostgreSQL can't assign one thread per data-file and
merge the results.
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance