You may want to consider using partitioning. That way you can drop the appropriate partition and never have the overhead of a delete. Jerry Champlin|Absolute Performance Inc.|Mobile: 303-588-2547 -----Original Message----- From: pgsql-performance-owner@xxxxxxxxxxxxxx [mailto:pgsql-performance-owner@xxxxxxxxxxxxxx] On Behalf Of Luca Tettamanti Sent: Tuesday, November 24, 2009 6:37 AM To: pgsql-performance@xxxxxxxxxxxxxx Subject: DELETE performance problem Hello, I've run in a severe performance problem with the following statement: DELETE FROM t1 WHERE t1.annotation_id IN ( SELECT t2.annotation_id FROM t2) t1 contains about 48M record (table size is 5.8GB), while t2 contains about 60M record (total size 8.6GB). annotation_id is the PK in t1 but not in t2 (it's not even unique, in fact there are duplicates - there are about 20M distinct annotation_id in this table). There are no FKs on either tables. I've killed the query after 14h(!) of runtime... I've reproduced the problem using a only the ids (extracted from the full tables) with the following schemas: test2=# \d t1 Table "public.t1" Column | Type | Modifiers ---------------+--------+----------- annotation_id | bigint | not null Indexes: "t1_pkey" PRIMARY KEY, btree (annotation_id) test2=# \d t2 Table "public.t2" Column | Type | Modifiers ---------------+--------+----------- annotation_id | bigint | Indexes: "t2_idx" btree (annotation_id) The query above takes about 30 minutes to complete. The slowdown is not as severe, but (IMHO) the behaviour is strange. On a win2k8 with 8.3.8 using procexp I see the process churning the disk and using more memory until it hits some limit (at about 1.8GB) then the IO slows down considerably. See this screenshot[1]. This is exactly what happens with the full dataset. This is the output of the explain: test2=> explain analyze delete from t1 where annotation_id in (select annotation _id from t2); QUERY PLAN ---------------------------------------------------------------------------- ---- --------------------------------------------------------- Hash Join (cost=1035767.26..2158065.55 rows=181605 width=6) (actual time=64339 5.565..1832056.588 rows=26185953 loops=1) Hash Cond: (t1.annotation_id = t2.annotation_id) -> Seq Scan on t1 (cost=0.00..661734.12 rows=45874812 width=14) (actual tim e=0.291..179119.487 rows=45874812 loops=1) -> Hash (cost=1033497.20..1033497.20 rows=181605 width=8) (actual time=6433 93.742..643393.742 rows=26185953 loops=1) -> HashAggregate (cost=1031681.15..1033497.20 rows=181605 width=8) (a ctual time=571807.575..610178.552 rows=26185953 loops=1) -> Seq Scan on t2 (cost=0.00..879289.12 rows=60956812 width=8) (actual time=2460.595..480446.581 rows=60956812 loops=1) Total runtime: 2271122.474 ms (7 rows) Time: 2274723,284 ms An identital linux machine (with 8.4.1) shows the same issue; with strace I see a lots of seeks: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 90.37 0.155484 15 10601 read 9.10 0.015649 5216 3 fadvise64 0.39 0.000668 0 5499 write 0.15 0.000253 0 10733 lseek 0.00 0.000000 0 3 open 0.00 0.000000 0 3 close 0.00 0.000000 0 3 semop ------ ----------- ----------- --------- --------- ---------------- 100.00 0.172054 26845 total (30s sample) Before hitting the memory "limit" (AS on win2k8, unsure about Linux) the trace is the following: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 100.00 0.063862 0 321597 read 0.00 0.000000 0 3 lseek 0.00 0.000000 0 76 mmap ------ ----------- ----------- --------- --------- ---------------- 100.00 0.063862 321676 total The machines have 8 cores (2 Xeon E5320), 8GB of RAM. Postgres data directory is on hardware (Dell PERC5) raid mirror, with the log on a separate array. One machine is running linux 64bit (Debian/stable), the other win2k8 (32 bit). shared_buffers = 512MB work_mem = 512MB maintenance_work_mem = 1GB checkpoint_segments = 16 wal_buffers = 8MB fsync = off # Just in case... usually it's enabled effective_cache_size = 4096MB (the machine with win2k8 is running with a smaller shared_buffers - 16MB) Any idea on what's going wrong here? thanks, Luca [1] http://img10.imageshack.us/i/psql2.png/ -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance