On 10/30/2009 08:01 PM, Greg Stark wrote:
On Fri, Oct 30, 2009 at 12:53 PM, Anj Adu<fotographs@xxxxxxxxx> wrote:
Any relational database worth its salt has partitioning for a reason.
1. Maintenance. You will need to delete data at some
point.(cleanup)...Partitions are the only way to do it effectively.
This is true and it's unavoidably a manual process. The database will
not know what segments of the data you intend to load and unload en
masse.
2. Performance. Partitioning offer a way to query smaller slices of
data automatically (i.e the query optimizer will choose the partition
for you) ...very large tables are a no-no in any relational
database.(sheer size has limitations)
This I dispute. Databases are designed to be scalable and very large
tables should perform just as well as smaller tables.
Where partitions win for performance is when you know something about
how your data is accessed and you can optimize the access by
partitioning along the same keys. For example if you're doing a
sequential scan of just one partition or doing a merge join of two
equivalently partitioned tables and the partitions can be sorted in
memory.
However in these cases it is possible the database will become more
intelligent and be able to achieve the same performance gains
automatically. Bitmap index scans should perform comparably to the
sequential scan of individual partitions for example.
So, on the becoming more intelligent front: PostgreSQL already does
some operations as background maintenance (autovacuum). Extending
this to de-bloat indices does not seem conceptually impossible, nor for
the collection of table-data statistics for planner guidance (also, why
could a full-table-scan not collect stats as a side-effect?). Further out,
how about the gathering of statistics on queries to guide the automatic
creation of indices? Or to set up a partitioning scheme on a previously
monolithic table?
- Jeremy
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance