On 05/08/2010 06:39 AM, thilo wrote:
Hi all! We moved from MySQL to Postgresql for some of our projects. So far we're very impressed with the performance (especially INSERTs and UPDATEs), except for a strange problem with the following bulk delete query: DELETE FROM table1 WHERE table2_id = ? I went through these Wiki pages, trying to solve the problem: http://wiki.postgresql.org/wiki/SlowQueryQuestions and http://wiki.postgresql.org/wiki/Performance_Optimization but unfortunately without much luck. Our application is doing batch jobs. On every batch run, we must delete approx. 1M rows in table1 and recreate these entries. The inserts are very fast, but deletes are not. We cannot make updates, because there's no identifying property in the objects of table1. This is what EXPLAIN is telling me: EXPLAIN ANALYZE DELETE FROM table1 WHERE table2_id = 11242939 QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------- Index Scan using sr_index on table1 (cost=0.00..8.56 rows=4 width=6) (actual time=0.111..0.154 rows=4 loops=1) Index Cond: (table2_id = 11242939) Total runtime: 0.421 ms (3 rows) This seems to be very fast (using the index), but running this query from JDBC takes up to 20ms each. For 1M rows this sum up to several hours. When I have a look at pg_top psql uses most of the time for the deletes. CPU usage is 100% (for the core used by postgresql). So it seems that postgresql is doing some sequential scanning or constraint checks. This is the table structure: id bigint (primary key) table2_id bigint (foreign key constraint to table 2, *indexed*) table3_id bigint (foreign key constraint to table 3, *indexed*) some non-referenced text and boolean fields My server settings (Potgresql 8.4.2): shared_buffers = 1024MB effective_cache_size = 2048MB work_mem = 128MB wal_buffers = 64MB checkpoint_segments = 32 checkpoint_timeout = 15min checkpoint_completion_target = 0.9 It would be very nice to give me a hint to solve the problem. It drives me crazy ;-) If you need more details please feel free to ask! Thanks in advance for your help! Kind regards Thilo
I am going to guess the slow part is sending 1M different queries back and forth from client to server. You could try batching them together: DELETE FROM table1 WHERE table2_id in (11242939, 1,2,3,4,5...., 42); Also are you preparing the query? -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance