Re: make bulk deletes faster?

Ang Chin Han <ang.chin.han@xxxxxxxxx> · Mon, 19 Dec 2005 22:47:27 +0800

On 12/18/05, James Klo <jklo@xxxxxxxxxxx> wrote:
> explain analyze delete from timeblock where timeblockid = 666666
>
> Index Scan using timeblockid_idx on timeblock  (cost=0.00..5.28 rows=1
> width=6) (actual time=0.022..0.022 rows=0 loops=1)
>   Index Cond: (timeblockid = 666666)
> Total runtime: 0.069 ms
... snip ...
> Here's what I've tried:
>
> Attempt 1:
> ----------
> delete from timeblock where timeblockid in (select timeblockid from
> timeblock_tmp)

The DELETE in Attempt 1 contains a join, so if this is the way you're
mainly specifying which rows to delete, you'll have to take into
account how efficient the join of timeblock and timeblock_tmp is. What
does

EXPLAIN ANALYZE select * from timeblock where timeblockid in (select
timeblockid from timeblock_tmp)

or

EXPLAIN ANALYZE delete from timeblock where timeblockid in (select
timeblockid from timeblock_tmp)

say?

You *should* at least get a "Hash IN join" for the outer loop, and
just one Seq scan on timeblock_tmp. Otherwise, consider increasing
your sort_mem (postgresql 7.x) or work_mem (postgresql 8.x) settings.
Another alternative is to reduce the amount of rows being archive at
one go to fit in the amount of sort_mem or work_mem that allows the
"Hash IN Join" plan. See
http://www.postgresql.org/docs/8.1/static/runtime-config-resource.html#GUC-WORK-MEM

On the other hand, PostgreSQL 8.1's partitioning sounds like a better
long term solution that you might want to look into.