Re: Slow Bulk Delete

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 05/08/2010 06:39 AM, thilo wrote:
Hi all!

We moved from MySQL to Postgresql for some of our projects. So far
we're very impressed with the performance (especially INSERTs and
UPDATEs), except for a strange problem with the following bulk delete
query:

DELETE FROM table1 WHERE table2_id = ?

I went through these Wiki pages, trying to solve the problem:

http://wiki.postgresql.org/wiki/SlowQueryQuestions and
http://wiki.postgresql.org/wiki/Performance_Optimization

but unfortunately without much luck.

Our application is doing batch jobs. On every batch run, we must
delete approx. 1M rows in table1 and recreate these entries. The
inserts are very fast, but deletes are not. We cannot make updates,
because there's no identifying property in the objects of table1.

This is what EXPLAIN is telling me:

EXPLAIN ANALYZE DELETE FROM table1 WHERE table2_id = 11242939
                                                          QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------
  Index Scan using sr_index on table1  (cost=0.00..8.56 rows=4 width=6)
(actual time=0.111..0.154 rows=4 loops=1)
    Index Cond: (table2_id = 11242939)
  Total runtime: 0.421 ms
(3 rows)

This seems to be very fast (using the index), but running this query
from JDBC takes up to 20ms each. For 1M rows this sum up to several
hours. When I have a look at pg_top psql uses most of the time for the
deletes. CPU usage is 100% (for the core used by postgresql). So it
seems that postgresql is doing some sequential scanning or constraint
checks.

This is the table structure:

id	bigint	 (primary key)
table2_id	bigint	 (foreign key constraint to table 2, *indexed*)
table3_id	bigint	 (foreign key constraint to table 3, *indexed*)
some non-referenced text and boolean fields

My server settings (Potgresql 8.4.2):

shared_buffers = 1024MB
effective_cache_size = 2048MB
work_mem = 128MB
wal_buffers = 64MB
checkpoint_segments = 32
checkpoint_timeout = 15min
checkpoint_completion_target = 0.9

It would be very nice to give me a hint to solve the problem. It
drives me crazy ;-)

If you need more details please feel free to ask!

Thanks in advance for your help!

Kind regards

Thilo


I am going to guess the slow part is sending 1M different queries back and forth from client to server.  You could try batching them together:

DELETE FROM table1 WHERE table2_id in (11242939, 1,2,3,4,5...., 42);

Also are you preparing the query?

-Andy

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux