Search Postgresql Archives

Re: Startup process thrashing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 11 Dec 2008, Phillip Berry wrote:

I'm not running PITR and checkpoint_segments is set to 100 as this is home to a very write intensive app.

That's weird then. It shouldn't ever keep around more than 201 WAL segments. I've heard one report of a similarly mysterious excess of them, from Robert Treat, but that was probably caused by a hardware failure.

1. Will copying the pg_xlog dir onto a larger partition cause any problems?

Ah, I didn't realize the disk it's on was still 100% full from what you said. You're not going to get anywhere until that's sorted out. You absolutely can move them all to another disk, you just can't delete them altogether or split them across logical disks. The usual technique is something like this:

cd $PGDATA
mv pg_xlog $BIGGER
ln -s $BIGGER/pg_xlog

And then fire the server up in single-user mode to issue the VACUUM it's begging for. Once it's back to running in regular mode again, issue a "CHECKPOINT" command, and it should erase most of those WAL files. Then you could stop the server and reverse the above:

cd $PGDATA
unlink pg_xlog
mv $BIGGER/pg_xlog .

While filling that disk and grinding to a halt is bad, it's not necessarily any worse then the alternative if you didn't have it on a dedicated filesystem--that this out of control xlog creation might create an even greater backlog of segments.

2. Is there any reason that vacuuming would create so many WAL files? I've looked around and can't find any explaination, though it's a problem that seems to have happened before.

Nothing should make that many WAL files, normally the only way you can get into that position is if you're running PITR recovery but not archiving the segments--so they just keep piling up but the server can't recycle them. What's supposed to happen is that every time there is a checkpoint, it deletes anything beyond 2*checkpoint_segments+1 worth of segments, that's where the 201 figure I mentioned comes from.

3. I installed pgbouncer yesterday for connection pooling, does the fact that it holds connections open have any bearing on how rapidly the transaction IDs are used up? As in are transaction IDs in anyway related to connections? It's a pretty interesting coincidence.

I'm not really familiar with pgbouncer to comment on the specifics of what it does. In general, keeping transactions open longer will impact the pg_clog section of the database, but that shouldn't cause the WAL to grow out of control. Once you get this back into production, you certainly should carefully monitor how much disk space is being taken up on the WAL disk moving forward. If the count of them there gets back over 201 again, that's an indication there's something that shouldn't be happening going on. That's the sort of thing you can get more information about

This isn't a real popular time of day for this list, some get some more ideas might show up once our many European members start posting in volume. As a general aside, if you ever find yourself in this position again, where you've got an urgent database problem, something you might do in parallel with posting here is trying the IRC channel: http://wiki.postgresql.org/wiki/IRC2RWNames

--
* Greg Smith gsmith@xxxxxxxxxxxxx http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux