Re: Deleting more efficiently from large partitions

Greg Spiegelberg <gspiegelberg@xxxxxxxxx> · Tue, 16 Jun 2020 07:57:10 -0600

On Mon, Jun 15, 2020 at 7:39 PM Wells Oliver <wells.oliver@xxxxxxxxx> wrote:
Hi all. I have a partitioned table (by month from a date column), where each partition contains something like 400m rows.

Each partition is defined by a PK with a uuid and date field (the parent table is partitioned by range on the date), and two other columns.

In doing a delete for a specific date, e.g. DELETE FROM t WHERE date = '2019-09-01' AND uuid IN (SELECT uuid FROM temptable), it runs very efficiently.

I am trying to write a processing script that deletes for potentially multiple dates & uuid values, and it just takes hours, trying:

DELETE FROM t WHERE date = (SELECT DISTINCT date from temp) AND uuid IN (select uuid from tempuuds) -- no go, hours.

Tried USING, e.g. DELETE FROM t USING temp WHERE t.date = temp.date AND t.uuid = temp.uuid -- no go, hours.

I just can't delete from this table without an explicit date and a set of uuids using a WHERE IN approach, but I need to.I was thinking of making a plpgsql function or something that loops through dates and makes a more explicit DELETE statement, but I'm thinking there must be some better way using indexing or something.

Appreciate any tips.

Have you considered partitioning by day instead of month?  Could eliminate an index you may have on the date column.
How many days are in the many-days DELETE?  Could you simply wrap it in a transaction and do one DELETE per day?

You could potentially get better performance removing the JOIN/sub-SELECT using
DELETE FROM mytable WHERE date_col = ANY( ARRAY['2020-01-01', '2020-01-13']::date[] );

HTH
-Greg