Re: DELETE taking too much memory

Guillaume Lelarge <guillaume@xxxxxxxxxxxx> · Thu, 07 Jul 2011 22:26:45 +0200

On Thu, 2011-07-07 at 15:34 +0200, vincent dephily wrote:
> Hi,
> 
> I have a delete query taking 7.2G of ram (and counting) but I do not
> understant why so much memory is necessary. The server has 12G, and
> I'm afraid it'll go into swap. Using postgres 8.3.14.
> 
> I'm purging some old data from table t1, which should cascade-delete
> referencing rows in t2. Here's an anonymized rundown :
> 
> 
> # \d t1
>                              Table "public.t1"
>   Column   |            Type             |             Modifiers
> -----------+-----------------------------+---------------------------------
>  t1id      | integer                     | not null default
> nextval('t1_t1id_seq'::regclass)
> (...snip...)
> Indexes:
>     "message_pkey" PRIMARY KEY, btree (id)
> (...snip...)
> 
> # \d t2
>                                Table "public.t2"
>      Column      |            Type             |        Modifiers
> -----------------+-----------------------------+-----------------------------
>  t2id            | integer                     | not null default
> nextval('t2_t2id_seq'::regclass)
>  t1id            | integer                     | not null
>  foo             | integer                     | not null
>  bar             | timestamp without time zone | not null default now()
> Indexes:
>     "t2_pkey" PRIMARY KEY, btree (t2id)
>     "t2_bar_key" btree (bar)
>     "t2_t1id_key" btree (t1id)
> Foreign-key constraints:
>     "t2_t1id_fkey" FOREIGN KEY (t1id) REFERENCES t1(t1id) ON UPDATE
> RESTRICT ON DELETE CASCADE
> 
> # explain delete from t1 where t1id in (select t1id from t2 where
> foo=0 and bar < '20101101');
>                                QUERY PLAN
> -----------------------------------------------------------------------------
>  Nested Loop  (cost=5088742.39..6705282.32 rows=30849 width=6)
>    ->  HashAggregate  (cost=5088742.39..5089050.88 rows=30849 width=4)
>          ->  Index Scan using t2_bar_key on t2  (cost=0.00..5035501.50
> rows=21296354 width=4)
>                Index Cond: (bar < '2010-11-01 00:00:00'::timestamp
> without time zone)
>                Filter: (foo = 0)
>    ->  Index Scan using t1_pkey on t1  (cost=0.00..52.38 rows=1 width=10)
>          Index Cond: (t1.t1id = t2.t1id)
> (7 rows)
> 
> 
> Note that the estimate of 30849 rows is way off : there should be
> around 55M rows deleted from t1, and 2-3 times as much from t2.
> 
> When looking at the plan, I can easily imagine that data gets
> accumulated below the nestedloop (thus using all that memory), but why
> isn't each entry freed once one row has been deleted from t1 ? That
> entry isn't going to be found again in t1 or in t2, so why keep it
> around ?
> 
> Is there a better way to write this query ? Would postgres 8.4/9.0
> handle things better ?
> 

Do you have any DELETE triggers in t1 and/or t2?

-- 
Guillaume
  http://blog.guillaume.lelarge.info
  http://www.dalibo.com

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general