Search Postgresql Archives

Re: Startup process thrashing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 11 Dec 2008, Phillip Berry wrote:

I've got a bit of a problem. It started last night when postgres (8.1.9) went down citing the need for a vacuum full to be done due to the transaction log needing to wraparound.

Not exactly. What it said was "To avoid a database shutdown, execute a full-database VACUUM". In that context, "full" means you vacuum everything in the database, but only regular VACUUM is needed. VACUUM FULL, as you learned the hard way, is a more intensive operation, and it's not needed to resolve the problem you started with. It's a somewhat unfortunately worded HINT.

During the vacuum of the larger of the databases a few hours in it failed, it's filled up the 18GB pg_xlog partition with over 1000 wal files. Due to running out of space the vacuum failed.

Ouch. Are you running PITR recovery by setting archive_command? Did you set checkpoint_segments to some very high value? 1000 WAL files is not normal, curious how you ended up with so many of them.

When I came in this morning I attempted to start postgres using the normal init script, and now it's
stuck.  The startup process is thrashing the disks and working hard, pg_controldata says it's in
recovery, but it's been going for over two hours.

It takes a long time to sort through 1000 WAL files and figure out if the database is consistent for every transactions mentioned there. If your VACUUM FULL ran for several hours and kicked out 1000 of them, it would be reasonable to expect the cleanup to also take many hours.

My question is where I should go from here?  Should i kill the startup script, clear out the excess
wal files, start the standalone server and try vacuum again?

Deleting the WAL files like that will leave your database completely trashed. The utility that's provided to do the job you're considering is pg_resetxlog:

http://www.postgresql.org/docs/8.1/static/app-pgresetxlog.html

Which is slightly safer, but note the dire warnings there. You are very likely to get some sort of data corruption if you do that, and you won't know where it is. You'll be facing a dump and restore to sort that out, and if you think the server startup is taking a while on a 156GB database you're really not going to be happy with how long a restore takes.

The safest thing you can do here is just wait for the server to finish recovery so it starts up. Watch the system activity with something like vmstat. If the server process is busy using the CPU and it's doing stuff with the disks, if you have evidence it's making progress, you'll be hard pressed to execute any manual recovery that's any safer or more efficient than that is.

Someone else may be able to point you toward better estimating how far it's got left to go, I haven't ever been stuck in your position for long enough before to figure that out myself. Good luck.

--
* Greg Smith gsmith@xxxxxxxxxxxxx http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux