Re: DELETE performance problem

"Jerry Champlin" <jchamplin@xxxxxxxxxxxxxxxxxxxxxxxx> · Tue, 24 Nov 2009 07:59:10 -0700

You may want to consider using partitioning.  That way you can drop the
appropriate partition and never have the overhead of a delete.

Jerry Champlin|Absolute Performance Inc.|Mobile:  303-588-2547

-----Original Message-----
From: pgsql-performance-owner@xxxxxxxxxxxxxx
[mailto:pgsql-performance-owner@xxxxxxxxxxxxxx] On Behalf Of Luca Tettamanti
Sent: Tuesday, November 24, 2009 6:37 AM
To: pgsql-performance@xxxxxxxxxxxxxx
Subject:  DELETE performance problem

Hello,
I've run in a severe performance problem with the following statement:

DELETE FROM t1 WHERE t1.annotation_id IN (
	SELECT t2.annotation_id FROM t2)

t1 contains about 48M record (table size is 5.8GB), while t2 contains about
60M
record (total size 8.6GB). annotation_id is the PK in t1 but not in t2 (it's
not even unique, in fact there are duplicates - there are about 20M distinct
annotation_id in this table). There are no FKs on either tables.
I've killed the query after 14h(!) of runtime...

I've reproduced the problem using a only the ids (extracted from the full
tables) with the following schemas:

test2=# \d t1
         Table "public.t1"
    Column     |  Type  | Modifiers
---------------+--------+-----------
 annotation_id | bigint | not null
Indexes:
    "t1_pkey" PRIMARY KEY, btree (annotation_id)

test2=# \d t2
         Table "public.t2"
    Column     |  Type  | Modifiers
---------------+--------+-----------
 annotation_id | bigint |
Indexes:
    "t2_idx" btree (annotation_id)

The query above takes about 30 minutes to complete. The slowdown is not as
severe, but (IMHO) the behaviour is strange. On a win2k8 with 8.3.8 using
procexp I see the process churning the disk and using more memory until it
hits
some limit (at about 1.8GB) then the IO slows down considerably. See this
screenshot[1].
This is exactly what happens with the full dataset.

This is the output of the explain:

test2=> explain analyze delete from t1 where annotation_id in (select
annotation
_id from t2);
                                                               QUERY PLAN

----------------------------------------------------------------------------
----
---------------------------------------------------------
 Hash Join  (cost=1035767.26..2158065.55 rows=181605 width=6) (actual
time=64339
5.565..1832056.588 rows=26185953 loops=1)
   Hash Cond: (t1.annotation_id = t2.annotation_id)
   ->  Seq Scan on t1  (cost=0.00..661734.12 rows=45874812 width=14) (actual
tim
e=0.291..179119.487 rows=45874812 loops=1)
   ->  Hash  (cost=1033497.20..1033497.20 rows=181605 width=8) (actual
time=6433
93.742..643393.742 rows=26185953 loops=1)
         ->  HashAggregate  (cost=1031681.15..1033497.20 rows=181605
width=8) (a
ctual time=571807.575..610178.552 rows=26185953 loops=1)
               ->  Seq Scan on t2  (cost=0.00..879289.12 rows=60956812
width=8)
(actual time=2460.595..480446.581 rows=60956812 loops=1)
 Total runtime: 2271122.474 ms
(7 rows)

Time: 2274723,284 ms

An identital linux machine (with 8.4.1) shows the same issue; with strace I
see
a lots of seeks:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 90.37    0.155484          15     10601           read
  9.10    0.015649        5216         3           fadvise64
  0.39    0.000668           0      5499           write
  0.15    0.000253           0     10733           lseek
  0.00    0.000000           0         3           open
  0.00    0.000000           0         3           close
  0.00    0.000000           0         3           semop
------ ----------- ----------- --------- --------- ----------------
100.00    0.172054                 26845           total

(30s sample) 

Before hitting the memory "limit" (AS on win2k8, unsure about Linux) the
trace
is the following:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.063862           0    321597           read
  0.00    0.000000           0         3           lseek
  0.00    0.000000           0        76           mmap
------ ----------- ----------- --------- --------- ----------------
100.00    0.063862                321676           total

The machines have 8 cores (2 Xeon E5320), 8GB of RAM. Postgres data
directory
is on hardware (Dell PERC5) raid mirror, with the log on a separate array.
One machine is running linux 64bit (Debian/stable), the other win2k8 (32
bit).

shared_buffers = 512MB
work_mem = 512MB
maintenance_work_mem = 1GB
checkpoint_segments = 16
wal_buffers = 8MB
fsync = off # Just in case... usually it's enabled
effective_cache_size = 4096MB

(the machine with win2k8 is running with a smaller shared_buffers - 16MB)

Any idea on what's going wrong here?

thanks,
Luca
[1] http://img10.imageshack.us/i/psql2.png/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance