Re: Best way to delete big amount of records from big table

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2020-03-27 at 15:13 +0100, Ekaterina Amez wrote:
> I'm trying to clean up a database with millions of records of 
> useless-but-don't-remove-just-in-case data. [...]
> 
> But also I'm cleaning tables with 150million records where I'm going to 
> remove 60% of existing data and after a few tests I'm not sure what's 
> the best approach as all seem to take similar time to run. These tables 
> are grouped in 4 tables group with master, detail, master_history, 
> detail_history structure. None of the tables have primary key nor 
> foreign key or any constraint but the sequence used for what should be 
> the PK column, though this column is not defined as PK.

You should define primary and foreign keys if you can, but I guess
I don't have to tell you that.

> I've decided to delete from the last one in chunks (10 days of data per 
> chunk but it coud be any other quantity) so I've created a function.  
> I've tested it with indexes (in master_hist for filtering data and in 
> detail_hist for the fk and pk), without indexes, after analyzing table, 
> and no matter what I always end up with more or less the same execution 
> time. I can afford the time it's getting to run but I'd like to know if 
> it's there a better way to do this.

There is no need to delete in batches unless you have a need to keep
transactions short (danger of deadlock because the data are still
modified, or you cannot afford to block autovacuum that long).

If you can drop the indexes while you do it (downtime), go for it.
Perhaps there is a way to use partial indexes that exclude all the
data that you have to delete, then work could go on as normal.

> I'm testing on version 9.2 BUT 
> production server is 8.4 (legacy application, supposed to be in at least 
> 9.2 but recently discovered it was 8.4, planning upgrade but not now). 
> Config parameters are default ones.

Now that is a seriously bad idea.  You should test on the same version
as you have running in production.  And you should insist in an upgrade.
People who insist in running ancient software often insist in ancient
hardware as well, and both is a good way to get data corruption.
If the system blows up, they are going to blame you.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com






[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux