Natalie Wenz <nataliewenz@xxxxxxxxxxx> wrote:
Sorry; my description of what is going on was a little unclear. We didn't upgrade the existing database. We moved it to different hardware, and just created a brand new database to accept the data that had been backing up in sqlite files while our original database was offline. I'm still dealing with the wraparound on the original, just on a different machine.
OK, to restate to be sure I understand, the original database is being vacuumed in a new location, and a new 9.3 database in the original location has absorbed the queued data? Once you complete the vacuum, you will copy the old data back to the new database at the old location?
Yes, exactly.
autovacuum_freeze_max_age | 800000000
Normally the autovacuum seem to keep up sufficiently. We got into trouble with a bad combination of not-typical long-running queries, a disk failure and subsequent zpool repair, and the only person who checks the log files regularly (me) was out of the office. This has been so painful and slow to recover from, I don't think we'll ever get into this mess again. (At least not quite like this. I seem to have a knack for finding *new* ways to break things.)
A perfect storm of events, eh? It's hard to have things always go smoothly in the face of such events, but I see a couple things you might want to consider. Increasing autovacuum_freeze_max_age reduces the amount of time you have to get back on track. You might want to take that back down to the default. There was a bug causing wraparound prevention autovacuums to trigger too frequently, which is now fixed in the latest minor releases, so making that chnage might not be as painful as you expect. Make sure you are monitoring for long-running transactions, so you don't get burned by one that is accidental. Also, if you have a processing cycle where there are off-peak hours on a daily or weekly basis, you might want to run a VACUUM ANALYZE command durning those windows, to get some of the freezing done before it is critical.
We talked a little bit about lowering the autovacuum_max_freeze_age, at least some, but there was concern that it would end up doing a lot more lengthy full-table scans. Is that a legitimate concern? Would it be prudent to change any of the other values back to their defaults at the same time? For example, we have the autovacuum_vacuum_cost_delay set to 0, with the idea that we don't mind if we take a hit on performance while the autovacuum is running; our priority is that it be able to finish as quickly as possible. If we start the vacuum earlier, though, maybe that should be bumped up too?
Does the autovacuum do different work when it is vacuuming to prevent wraparound (and that's triggered when a table passes the autovacuum_max_freeze_age, right?) and a vacuum triggered by the table changing in size by a certain amount, or a manually-invoked vacuum?
(Are there any books, or articles, that cover "Vacuuming and Autovacuuming: the gory details"?)
Rereading the Routine Vacuuming page in the docs, this sentence caught my eye:
"However, for static tables (including tables that receive inserts, but no updates or deletes), there is no need to vacuum for space reclamation, so it can be useful to try to maximize the interval between forced autovacuums on very large static tables. Obviously one can do this either by increasing autovacuum_freeze_max_age or decreasing vacuum_freeze_min_age."
We generally never delete from this database at all. This case was unusual; I was migrating the data from one table to another because we added some columns, and changed the datatype of many of the columns from text to more appropriate types (timestamp, int, uuid, inet, etc). Ideally, even then we wouldn't have preferred to delete anything until the whole table was migrated, but disk space became an issue. Bleh.
With that in mind, would you still recommend putting the autovacuum_max_freeze_age back to 200 million? I will also take this opportunity to mention again that if anyone is considering a making a patch for 64-bit xids, you would make at least one small group of people very, very happy. :)
While 64-bit xids isn't likely, there is some work taking a more creatie approach to the issue which might make you even happier in a year or so. :-)
Oh, the suspense! I will be on the lookout for that!
Side question: some of the longer queries that were running when the database yakked were deletes of old data that had been manually migrated to a new table with a more appropriate format. We were running out of disk space, so we were trying to clear up some space by removing data we now had in two places. Does a delete of previously-frozen rows unfreeze them, or anything like that? Because in a series of maybe a dozen queries or so, we deleted billions of rows. Does that generate a significant amount of extra work for the autovacuumer?
The pages which had tuples deleted would need to be cleaned up by vacuum, and rewritten. It would also remove all index entries for all deleted rows. It might also scan backward from the end of the table to release space to the OS. That could conceivably be enough avoidable work to make your idea of copying out the remaining data feasible. What percentage of the rows were deleted? Could your copy out be to a separate set of drives?
I just did a little estimating, and it looks like around 48% of the data from one of the 14TB tables was deleted, about 45 billion rows. Sounds like I gave the vacuum a major amount of work to do. Yikes. I'll look into whether I can copy out to separate drives. I know the server I'm working on has over 300TB free, but I'm not certain of how it's set up. I'll check to see if it's possible to copy to a separate set of drives. If you know at the time a row is added what group it will be in for deletion, it might pay to move to partitioning, so that a group of rows could be deleted pretty much as fast as you can drop a table.
Partitioning has come up occasionally in the past, as this database can be so cumbersome to work with. I'll look into that some more, and try to pitch it to the Official Deciders.
-- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
-- Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin
|