Search Postgresql Archives

Re: Fast way to delete big table?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello

I am not sure if this is an option, but would partial indexes be of help? You could create your indexes in such a way that you exclude the rows that are not needed for your current queries, i.e. those that you would summarize in a separate table and delete using the same condition that you use for selecting which rows must be deleted.

CREATE INDEX ... ON tablea (...) WHERE ...;

This would not make deletion quicker, but operations should have a gain.

What I also see is that you don't have any condition when selecting from the loop, i.e. you actually go through all rows and decide on the fly which you must delete. Maybe reducing the number of rows to be processed by the function helps.

Finally I don't know the structure of the table, but you may select only those columns that you need for your summary instead of *. If you have many columns this could have an impact on the performance of the select statement.

Regards
Charles

On 05/16/2016 07:09 AM, Haiming Zhang wrote:
Hi All,

I have a big table (200G with 728 million records), the table slows down
lots of things. It's time to clean the data up. The requirement is when
I delete I must summarise few columns to a new table for backup purpose.
So that mean I need to go through each row and add the columns' value to
the summary table (the corresponding category).

The table has indexes (multicoumn indexes) before, I removed the indexes
because I read on forum says the indexes *heavily* reduce the speed of
deleting. That's true if I only delete from the table, but my case is I
first SELECT this row, then add few values to the summary table and then
DELETE the row according to its multiple primary key. This seems to be a
bad idea now as it takes very long time to DELETE the row (as delete
needs to find the row first).

Here are the two version of the delete functions, please help to point
out how can I speed it up.

1.

CREATE OR REPLACE FUNCTION summary_delete_table()

RETURNS integer AS

$BODY$

DECLARE


rec                     RECORD;


subrec                  RECORD;

BEGIN

                                                                 FOR rec
IN SELECT * FROM tableA limit 100 LOOP

                                                                                                                                 BEGIN

                                                                                                                                                 UPDATE summaryTable set count1 = rec.count1 + rec.count1, count2 =... where category match;

                                                                                                                                                 delete from tableA where tableA.primaryKeyA=rec.primaryKeyA and tableA.primaryKeyB=rec.primaryKeyB;

                                                                                                                                 END;

                                                                 END LOOP;

                                                                 return 1;

END;

$BODY$

LANGUAGE plpgsql VOLATILE

COST 100;

And then I have a .bat script to loop the above function million times.

2.

CREATE OR REPLACE FUNCTION summary_delete_table()

RETURNS integer AS

$BODY$

DECLARE


rec                     RECORD;


td_cursor            CURSOR FOR SELECT * FROM tableA limit 100;

BEGIN

                                                                 FOR rec
IN td_cursor LOOP

                                                                                                 BEGIN

                                                                                                                 UPDATE summaryTable set count1 = rec.count1 + rec.count1, count2 =... where category match;

                                                                                                                 delete from tableA WHERE CURRENT OF td_cursor;

                                                                                                 END;

                                                                 END LOOP;

                                                                 return 1;

END;

$BODY$

LANGUAGE plpgsql VOLATILE

COST 100;

Method 2 is bit faster but not much, the delete speed is 2478 rows/s for
method 2 and 2008 rows/s for method 1.

Any suggestions are welcomed.

BTW, I guess if reindex, it may take few days to finish.

Also, I tried change delete 100 rows at a time and 1000, 2000. The
result showed 1000 is faster than 100 and 2000 a time.

Thanks and Regards,

Haiming

	

*Haiming Zhang* | Engineer | *Redflex Group*
*T* +61 3 9674 1868 | *F* +61 3 9699 3566 | *E*
Haiming.Zhang@xxxxxxxxxxxxxx
31 Market Street, (PO Box 720), South Melbourne VIC 3205, Australia

If you are not an authorised recipient of this e-mail, please contact me
at Redflex immediately by return phone call or by email. In this case,
you should not read, print, retransmit, store or act in reliance on this
e-mail or any attachments, and should destroy all copies of them. This
e-mail and any attachments are confidential and may contain privileged
information and/or copyright material of Redflex or third parties. You
should only retransmit, distribute or commercialise the material if you
are authorised to do so. This notice should not be removed.


--
Swiss PostgreSQL Users Group
c/o Charles Clavadetscher
Motorenstrasse 18
CH - 8005 Zürich

http://www.swisspug.org


--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux