Re: Fast way to delete big table?

Haiming Zhang <Haiming.Zhang@xxxxxxxxxxxxxx> · Mon, 16 May 2016 05:33:45 +0000

Hi Gavin,

Thanks for the suggestion. What you said was what I tried to do (except the last reindexes, planned to do it when the table is cleaned up), however it was too slow. I have run for two days, nothing much happened.

Truncate does not work for my purpose.

Regards,

Haiming

Haiming Zhang | Engineer | Redflex Group

T +61 3 9674 1868 | F +61 3 9699 3566 | E Haiming.Zhang@xxxxxxxxxxxxxx

31 Market Street, (PO Box 720), South Melbourne VIC 3205, Australia

If you are not an authorised recipient of this e-mail, please contact me at Redflex immediately by return phone call or by email. In this case, you should not read, print, retransmit, store or act in reliance
 on this e-mail or any attachments, and should destroy all copies of them. This e-mail and any attachments are confidential and may contain privileged information and/or copyright material of Redflex or third parties. You should only retransmit, distribute
 or commercialise the material if you are authorised to do so. This notice should not be removed.

-----Original Message-----

From: Gavin Flower [mailto:GavinFlower@xxxxxxxxxxxxxxxxx]

Sent: Monday, 16 May 2016 3:24 PM

To: Haiming Zhang <Haiming.Zhang@xxxxxxxxxxxxxx>; pgsql-general@xxxxxxxxxxxxxx

Subject: Re:  Fast way to delete big table?

On 16/05/16 17:09, Haiming Zhang wrote:

>

> Hi All,

>

> I have a big table (200G with 728 million records), the table slows

> down lots of things. It's time to clean the data up. The requirement

> is when I delete I must summarise few columns to a new table for

> backup purpose. So that mean I need to go through each row and add the

> columns' value to the summary table (the corresponding category).

>

> The table has indexes (multicoumn indexes) before, I removed the

> indexes because I read on forum says the indexes *heavily* reduce the

> speed of deleting. That's true if I only delete from the table, but my

> case is I first SELECT this row, then add few values to the summary

> table and then DELETE the row according to its multiple primary key.

> This seems to be a bad idea now as it takes very long time to DELETE

> the row (as delete needs to find the row first).

>

> Here are the two version of the delete functions, please help to point

> out how can I speed it up.

>

> 1.

>

> CREATE OR REPLACE FUNCTION summary_delete_table()

>

> RETURNS integer AS

>

> $BODY$

>

> DECLARE

>

> rec                     RECORD;

>

> subrec                  RECORD;

>

> BEGIN

>

> FOR rec IN SELECT * FROM tableA limit 100 LOOP

>

> BEGIN

>

> UPDATE summaryTable set count1 = rec.count1 + rec.count1, count2 =...

> where category match;

>

> delete from tableA where tableA.primaryKeyA=rec.primaryKeyA and

> tableA.primaryKeyB=rec.primaryKeyB;

>

> END;

>

> END LOOP;

>

> return 1;

>

> END;

>

> $BODY$

>

> LANGUAGE plpgsql VOLATILE

>

> COST 100;

>

> And then I have a .bat script to loop the above function million times.

>

> 2.

>

> CREATE OR REPLACE FUNCTION summary_delete_table()

>

> RETURNS integer AS

>

> $BODY$

>

> DECLARE

>

> rec                     RECORD;

>

> td_cursor            CURSOR FOR SELECT * FROM tableA limit 100;

>

> BEGIN

>

> FOR rec IN td_cursor LOOP

>

> BEGIN

>

> UPDATE summaryTable set count1 = rec.count1 + rec.count1, count2 =...

> where category match;

>

> delete from tableA WHERE CURRENT OF td_cursor;

>

> END;

>

> END LOOP;

>

> return 1;

>

> END;

>

> $BODY$

>

> LANGUAGE plpgsql VOLATILE

>

> COST 100;

>

> Method 2 is bit faster but not much, the delete speed is 2478 rows/s

> for method 2 and 2008 rows/s for method 1.

>

> Any suggestions are welcomed.

>

> BTW, I guess if reindex, it may take few days to finish.

>

> Also, I tried change delete 100 rows at a time and 1000, 2000. The

> result showed 1000 is faster than 100 and 2000 a time.

>

> Thanks and Regards,

>

> Haiming

>

[...]

In one transaction:

 1. populate the summary table

 2. drop indexes on tableA

 3. delete selected entries in tableA

 4. recreate indexes for tableA

If deleting all entries, then simply truncate it!

N.B. I have NOT checked the fine print in the documentation, nor tested this - so be warned!  :-)

Cheers,

Gavin


	Haiming Zhang \| Engineer \| Redflex Group T +61 3 9674 1868 \| F +61 3 9699 3566 \| E Haiming.Zhang@xxxxxxxxxxxxxx 31 Market Street, (PO Box 720), South Melbourne VIC 3205, Australia
If you are not an authorised recipient of this e-mail, please contact me at Redflex immediately by return phone call or by email. In this case, you should not read, print, retransmit, store or act in reliance on this e-mail or any attachments, and should destroy all copies of them. This e-mail and any attachments are confidential and may contain privileged information and/or copyright material of Redflex or third parties. You should only retransmit, distribute or commercialise the material if you are authorised to do so. This notice should not be removed.