Re: Very long deletion time on a 200 GB database

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi again, everyone.

Wow, I can't get over how helpful everyone has been.

Shaun wrote:

The main problem you're going to run into is that your table is larger than the memory in that server. 4GB is really pretty small for a server hosting a 200+GB database. That they didn't mean it to get that big doesn't really help you clean it up.

Yep! And as you pointed out later in you note, PostgreSQL isn't the only thing running on this computer. There's also a full-fledged Windows application normally running on it. And the nature of the manufacturing, black-box context means that maintenance is supposed to be rare, and that anything which gets us off of a 24/7 work schedule is enormously expensive.

This has been a fun problem to fix, for sure... We're not there yet, but I feel like we're really close.

I'm currently trying a hybrid approach, based on several suggestions that were posted to this list:

Given that during this maintenance operation, nothing else should running, I'm going to bump up the shared_buffers. Even after we run our maintenance, the fact that shared_buffers was so ridiculously low couldn't be helping anything, and I'll push it up.

I finally remembered why I had such a tortured set of subselects in my original query: If you're going to do a query with LIMIT in it, you had better be sure that you know what you're doing, because without an ORDER BY clause, you might be in for surprises. And sure enough, in our testing, I realized that when we asked the database for up to 5 rows, we were getting the same rows again and again, thus stopping after it deleted a few bunches of rows.

So I changed tactics somewhat, and it appears to be working much, much faster: I first created a table (not a temp table, simply because my functions are getting invoked by the .NET application in a new connection each time, and I obviously don't want my table to go away) with the IDs of the R table that are older than n days old. This table has about 200,000 rows in it, but each column is an int, so it's pretty small.

I then have a separate function that takes a parameter, the chunk size. I loop through the table created in the first function (old_report_ids), deleting all of the records in the B table that references the R table. I then remove the row from the old_report_ids table, and then loop again, until I've reached the chunk size. There are undoubtedly more elegant ways to do this, but we just gotta get it working at this point. :-)

We're about to test this, but from my small tests on my computer, it ran much, much faster than other options. We'll see what happens when we try it now on the 200 GB monster...

Reuven

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux