Re: DELETE performance problem

Kris Kewley <kris.kewley@xxxxxxxxx> · Tue, 24 Nov 2009 19:47:46 -0500

 Even though the column in question is not unique on t2 could you not  
index it? That should improve the performance of the inline query.

Are dates applicable in any way? In some cases adding a date field,  
partitioning or indexing on that and adding where date>x days. That  
can be an effective way to limit records searched.

Kris

On 24-Nov-09, at 9:59, "Jerry Champlin" <jchamplin@xxxxxxxxxxxxxxxxxxxxxxxx 
> wrote:

You may want to consider using partitioning.  That way you can drop  
the
appropriate partition and never have the overhead of a delete.

Jerry Champlin|Absolute Performance Inc.|Mobile:  303-588-2547

-----Original Message-----
From: pgsql-performance-owner@xxxxxxxxxxxxxx
[mailto:pgsql-performance-owner@xxxxxxxxxxxxxx] On Behalf Of Luca  
Tettamanti
Sent: Tuesday, November 24, 2009 6:37 AM
To: pgsql-performance@xxxxxxxxxxxxxx
Subject:  DELETE performance problem

Hello,
I've run in a severe performance problem with the following statement:

DELETE FROM t1 WHERE t1.annotation_id IN (
   SELECT t2.annotation_id FROM t2)

t1 contains about 48M record (table size is 5.8GB), while t2  
contains about
60M
record (total size 8.6GB). annotation_id is the PK in t1 but not in  
t2 (it's
not even unique, in fact there are duplicates - there are about 20M  
distinct
annotation_id in this table). There are no FKs on either tables.
I've killed the query after 14h(!) of runtime...

I've reproduced the problem using a only the ids (extracted from the  
full
tables) with the following schemas:

test2=# \d t1
        Table "public.t1"
   Column     |  Type  | Modifiers
---------------+--------+-----------
annotation_id | bigint | not null
Indexes:
   "t1_pkey" PRIMARY KEY, btree (annotation_id)

test2=# \d t2
        Table "public.t2"
   Column     |  Type  | Modifiers
---------------+--------+-----------
annotation_id | bigint |
Indexes:
   "t2_idx" btree (annotation_id)

The query above takes about 30 minutes to complete. The slowdown is  
not as
severe, but (IMHO) the behaviour is strange. On a win2k8 with 8.3.8  
using
procexp I see the process churning the disk and using more memory  
until it
hits
some limit (at about 1.8GB) then the IO slows down considerably. See  
this
screenshot[1].
This is exactly what happens with the full dataset.

This is the output of the explain:

test2=> explain analyze delete from t1 where annotation_id in (select
annotation
_id from t2);
                                                              QUERY  
PLAN

--- 
--- 
----------------------------------------------------------------------
----
---------------------------------------------------------
Hash Join  (cost=1035767.26..2158065.55 rows=181605 width=6) (actual
time=64339
5.565..1832056.588 rows=26185953 loops=1)
  Hash Cond: (t1.annotation_id = t2.annotation_id)
  ->  Seq Scan on t1  (cost=0.00..661734.12 rows=45874812 width=14)  
(actual
tim
e=0.291..179119.487 rows=45874812 loops=1)
  ->  Hash  (cost=1033497.20..1033497.20 rows=181605 width=8) (actual
time=6433
93.742..643393.742 rows=26185953 loops=1)
        ->  HashAggregate  (cost=1031681.15..1033497.20 rows=181605
width=8) (a
ctual time=571807.575..610178.552 rows=26185953 loops=1)
              ->  Seq Scan on t2  (cost=0.00..879289.12 rows=60956812
width=8)
(actual time=2460.595..480446.581 rows=60956812 loops=1)
Total runtime: 2271122.474 ms
(7 rows)

Time: 2274723,284 ms

An identital linux machine (with 8.4.1) shows the same issue; with  
strace I
see
a lots of seeks:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
90.37    0.155484          15     10601           read
 9.10    0.015649        5216         3           fadvise64
 0.39    0.000668           0      5499           write
 0.15    0.000253           0     10733           lseek
 0.00    0.000000           0         3           open
 0.00    0.000000           0         3           close
 0.00    0.000000           0         3           semop
------ ----------- ----------- --------- --------- ----------------
100.00    0.172054                 26845           total

(30s sample)

Before hitting the memory "limit" (AS on win2k8, unsure about Linux)  
the
trace
is the following:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.063862           0    321597           read
 0.00    0.000000           0         3           lseek
 0.00    0.000000           0        76           mmap
------ ----------- ----------- --------- --------- ----------------
100.00    0.063862                321676           total

The machines have 8 cores (2 Xeon E5320), 8GB of RAM. Postgres data
directory
is on hardware (Dell PERC5) raid mirror, with the log on a separate  
array.
One machine is running linux 64bit (Debian/stable), the other win2k8  
(32
bit).

shared_buffers = 512MB
work_mem = 512MB
maintenance_work_mem = 1GB
checkpoint_segments = 16
wal_buffers = 8MB
fsync = off # Just in case... usually it's enabled
effective_cache_size = 4096MB

(the machine with win2k8 is running with a smaller shared_buffers -  
16MB)

Any idea on what's going wrong here?

thanks,
Luca
[1] http://img10.imageshack.us/i/psql2.png/

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx 
)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx 
)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance