Very long deletion time on a 200 GB database

"Reuven M. Lerner" <reuven@xxxxxxxxxxxx> · Thu, 23 Feb 2012 10:39:49 +0200

Hi, everyone.  I'm maintaining an application that exists as a "black 
box" in manufacturing plants.  The system is based on Windows, .NET, and 
PostgreSQL 8.3.  I'm a Unix kind of guy myself, but the application 
layer and system administration are being handled by other people; I'm 
just the PostgreSQL guy.

Because of the nature of the application, we don't have direct control 
over what happens.  And it turns out that at one installation, we're 
quickly running out of disk space.  The database is already taking up 
about 200 GB of space, and is growing by 1 GB or so a day.  Switching 
disks, either to a larger/faster traditional drive, or even to a SSD, is 
not an option.  (And yes, I know that SSDs have their own risks, but I'm 
just throwing that out as one option.)

Right now, the best solution to the space problem is to delete 
information associated with old records, where "old" is from at least 30 
days ago.  The old records are spread across a few tables, including 
many large objects.  (The application was written by people who were new 
to PostgreSQL, and didn't realize that they could use BYTEA.) 
Basically, given a foreign key B.a_id that points to table A, I want to 
DELETE all in B where A's creation date is at least 30 days ago.

Unfortunately, when we implemented this simple delete, it executed 
slower than molasses, taking about 9 hours to do its thing.   Not only 
does this seem like a really, really long time to do such deleting, but 
we have only a 4-hour window in which to run this maintenance activity, 
before the factory starts to use our black box again.

I've tried a few approaches so far, none of which have been hugely 
successful.  The fact that it takes several hours to test each theory is 
obviously a bit of a pain, and so I'm curious to hear suggestions from 
people here.

I should note that my primary concern is available RAM.  The database, 
as I wrote, is about 200 GB in size, and PostgreSQL is reporting 
(according to Windows) use of about 5 GB RAM, plus another 25 GB of 
virtual memory.  I've told the Windows folks on this project that 
virtual memory kills a database, and that it shouldn't surprise us to 
have horrible performance if the database and operating system are both 
transferring massive amounts of data back and forth.  But there doesn't 
seem to be a good way to handle this

This is basically what I'm trying to execute:

DELETE FROM  B
WHERE r_id IN (SELECT R.id
     FROM R, B
    WHERE r.end_date < (NOW() - (interval '1 day' * 30))
      AND r.id = b.r_id

(1) I tried to write this as a join, rather than a subselect.  But B has 
an oid column that points to large objects, and on which we have a rule 
that removes the associated large object when a row in B is removed. 
Doing the delete as a join resulted in "no such large object with an oid 
of xxx" errors.  (I'm not sure why, although it might have to do with 
the rule.)

(2) I tried to grab the rows that *do* interest me, put them into a 
temporary table, TRUNCATE the existing table, and then copy the rows 
back.   I only tested that with a 1 GB subset of the data, but that took 
longer than other options.

(3) There are some foreign-key constraints on the B table.  I thought 
that perhaps doing a mass DELETE was queueing up all of those 
constraints, and possibly using up lots of memory and/or taking a long 
time to execute.  I thus rewrote my queries such that they first removed 
the constraints, then executed the DELETE, and then restored the 
constraints.  That didn't seem to improve things much either, and took a 
long time (30 minutes) just to remove the constraints.  I expected 
re-adding the constraints to take a while, but shouldn't removing them 
be relatively quick?

(4) I tried "chunking" the deletes, such that instead of trying to 
delete all of the records from the B table, I would instead delete just 
those associated with 100 or 200 rows from the R table.  On a 1 GB 
subset of the data, this seemed to work just fine.  But on the actual 
database, it was still far too slow.

I've been surprised by the time it takes to delete the records in 
question.  I keep trying to tell the others on this project that 
PostgreSQL isn't inherently slow, but that a 200 GB database running on 
a non-dedicated machine, with an old version (8.3), and while it's 
swapping RAM, will be slow regardless of the database software we're 
using.  But even so, 9 hours to delete 100 GB of data strikes me as a 
very long process.

Again, I continue to believe that given our hard time deadlines, and the 
fact that we're using a large amount of virtual memory, that there isn't 
really a solution that will work quickly and easily.  But I'd be 
delighted to be wrong, and welcome any and all comments and suggestions 
for how to deal with this.

Reuven

--
Reuven M. Lerner -- Web development, consulting, and training
Mobile: +972-54-496-8405 * US phone: 847-230-9795
Skype/AIM: reuvenlerner

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance