Search Postgresql Archives

how to delete many rows from a huge table?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have a table of about 1 G rows, and I want to delete about 5 M rows,
listed in another table.
The big table is

             Table "public.backlinks"
     Column      | Type  | Modifiers | Description
-----------------+-------+-----------+-------------
 key             | bytea | not null  |
 backlink        | text  | not null  |
 backlink_hash   | bytea | not null  |
 url             | text  | not null  |
 time_downloaded | date  | not null  |
Triggers:
    insert_backlinks_trigger BEFORE INSERT ON backlinks FOR EACH ROW
EXECUTE PROCEDURE backlinks_insert_trigger()
Has OIDs: no

It is divided into 64 partitions, like

            Table "public.backlinks_0"
     Column      | Type  | Modifiers | Description
-----------------+-------+-----------+-------------
 key             | bytea | not null  |
 backlink        | text  | not null  |
 backlink_hash   | bytea | not null  |
 url             | text  | not null  |
 time_downloaded | date  | not null  |
Indexes:
    "backlinks_0_pkey" PRIMARY KEY, btree (key, backlink_hash),
tablespace "fastdisk"
Check constraints:
    "backlinks_0_key_check" CHECK (partition(key) = 0)
Inherits: backlinks
Has OIDs: no

The partitions are very evenly filled.  The partition function just
takes the first byte modulo 64.

The table listing what I want to delete has just the key values.

If I just do

DELETE FROM backlinks b USING bad_links bl WHERE b.key = bl.key;

then it grinds for an hour or so and runs out of memory.

If I do

DELETE FROM backlinks b WHERE b.key IN (SELECT bl.key FROM bad_links
bl LIMIT 40 OFFSET 0);

it finishes in milliseconds, but if I set the LIMIT to 50, it runs for
minutes til I kill it.  EXPLAIN says that it is doing a sequential
scan on several of the partitions in the 50 case, but not the 40.
Auto-vacuuming is running on the DB.

If I put that DELETE into a procedure and loop on the offset, it acts
like the 50 case, even if i set the LIMIT to 1.

I am currently running a shell loop doing the delete in chunks of 40,
but it has been running all weekend and is only half way through.

What am I doing wrong?

-- 
mARK bLOORE <mbloore@xxxxxxxxx>

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux