Search Postgresql Archives

Re: Fast way to delete big table?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 16/05/16 17:09, Haiming Zhang wrote:

Hi All,

I have a big table (200G with 728 million records), the table slows down lots of things. It's time to clean the data up. The requirement is when I delete I must summarise few columns to a new table for backup purpose. So that mean I need to go through each row and add the columns' value to the summary table (the corresponding category).

The table has indexes (multicoumn indexes) before, I removed the indexes because I read on forum says the indexes *heavily* reduce the speed of deleting. That's true if I only delete from the table, but my case is I first SELECT this row, then add few values to the summary table and then DELETE the row according to its multiple primary key. This seems to be a bad idea now as it takes very long time to DELETE the row (as delete needs to find the row first).

Here are the two version of the delete functions, please help to point out how can I speed it up.

1.

CREATE OR REPLACE FUNCTION summary_delete_table()

RETURNS integer AS

$BODY$

DECLARE

rec                     RECORD;

subrec                  RECORD;

BEGIN

FOR rec IN SELECT * FROM tableA limit 100 LOOP

BEGIN

UPDATE summaryTable set count1 = rec.count1 + rec.count1, count2 =... where category match;

delete from tableA where tableA.primaryKeyA=rec.primaryKeyA and tableA.primaryKeyB=rec.primaryKeyB;

END;

END LOOP;

return 1;

END;

$BODY$

LANGUAGE plpgsql VOLATILE

COST 100;

And then I have a .bat script to loop the above function million times.

2.

CREATE OR REPLACE FUNCTION summary_delete_table()

RETURNS integer AS

$BODY$

DECLARE

rec                     RECORD;

td_cursor            CURSOR FOR SELECT * FROM tableA limit 100;

BEGIN

FOR rec IN td_cursor LOOP

BEGIN

UPDATE summaryTable set count1 = rec.count1 + rec.count1, count2 =... where category match;

delete from tableA WHERE CURRENT OF td_cursor;

END;

END LOOP;

return 1;

END;

$BODY$

LANGUAGE plpgsql VOLATILE

COST 100;

Method 2 is bit faster but not much, the delete speed is 2478 rows/s for method 2 and 2008 rows/s for method 1.

Any suggestions are welcomed.

BTW, I guess if reindex, it may take few days to finish.

Also, I tried change delete 100 rows at a time and 1000, 2000. The result showed 1000 is faster than 100 and 2000 a time.

Thanks and Regards,

Haiming

[...]

In one transaction:

1. populate the summary table
2. drop indexes on tableA
3. delete selected entries in tableA
4. recreate indexes for tableA

If deleting all entries, then simply truncate it!

N.B. I have NOT checked the fine print in the documentation, nor tested this - so be warned! :-)


Cheers,
Gavin




--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux