Re: make bulk deletes faster?

James Klo <jklo@xxxxxxxxxxx> · Mon, 19 Dec 2005 11:10:50 -0800

Mitch Skinner wrote:

Have you considered partitioning?

http://www.postgresql.org/docs/8.1/interactive/ddl-partitioning.html

If you can partition your timeblock table so that you archive an entire
partition at a time, then you can delete the archived rows by just
dropping (or truncating) that partition.  AFAIK there's no way to
"re-parent" a partition (e.g., from the timeblock table to the
timeblock_archive table).

If your app is particularly cooperative you might be able to use
partitioning to avoid moving data around entirely.  If table accesses
are always qualified by something you can use as a partitioning key,
then partitioning can give you the speed benefits of a small table
without the effort of keeping it cleared out.

Yes, I've considered partitioning as a long term change. I was thinking 
about this for other reasons - mainly performance.  If I go the 
partitioning route, would I need to even perform archival?

The larger problem that I need to solve is really twofold:

1. Need to keep reads on timeblocks that are from the current day 
through the following seven days very fast, especially current day reads.

2. Need to be able to maintain the timeblocks for reporting purposes, 
for at least a year (potentially more).  This could probably better 
handled performing aggregate analysis, but this isn't on my current radar.

Another good read, if you haven't yet, is
http://powerpostgresql.com/Downloads/annotated_conf_80.html
especially the "Memory", "Checkpoints", and maybe "WAL options"
sections.  If you're doing large deletes then you may need to increase
your free space map settings--if a VACUUM VERBOSE finishes by saying
that you need more FSM pages, then the table may have gotten bloated
over time (which can be fixed with a configuration change and a VACUUM
FULL, though this will lock everything else out of the table while it's
running).

Thanks, I will look into this as well.