On Thu, 11 Dec 2008, Phillip Berry wrote:
I'm not running PITR and checkpoint_segments is set to 100 as this is
home to a very write intensive app.
That's weird then. It shouldn't ever keep around more than 201 WAL
segments. I've heard one report of a similarly mysterious excess of them,
from Robert Treat, but that was probably caused by a hardware failure.
1. Will copying the pg_xlog dir onto a larger partition cause any
problems?
Ah, I didn't realize the disk it's on was still 100% full from what you
said. You're not going to get anywhere until that's sorted out. You
absolutely can move them all to another disk, you just can't delete them
altogether or split them across logical disks. The usual technique is
something like this:
cd $PGDATA
mv pg_xlog $BIGGER
ln -s $BIGGER/pg_xlog
And then fire the server up in single-user mode to issue the VACUUM it's
begging for. Once it's back to running in regular mode again, issue a
"CHECKPOINT" command, and it should erase most of those WAL files. Then
you could stop the server and reverse the above:
cd $PGDATA
unlink pg_xlog
mv $BIGGER/pg_xlog .
While filling that disk and grinding to a halt is bad, it's not
necessarily any worse then the alternative if you didn't have it on a
dedicated filesystem--that this out of control xlog creation might create
an even greater backlog of segments.
2. Is there any reason that vacuuming would create so many WAL files?
I've looked around and can't find any explaination, though it's a
problem that seems to have happened before.
Nothing should make that many WAL files, normally the only way you can get
into that position is if you're running PITR recovery but not archiving
the segments--so they just keep piling up but the server can't recycle
them. What's supposed to happen is that every time there is a checkpoint,
it deletes anything beyond 2*checkpoint_segments+1 worth of segments,
that's where the 201 figure I mentioned comes from.
3. I installed pgbouncer yesterday for connection pooling, does the fact
that it holds connections open have any bearing on how rapidly the
transaction IDs are used up? As in are transaction IDs in anyway related
to connections? It's a pretty interesting coincidence.
I'm not really familiar with pgbouncer to comment on the specifics of what
it does. In general, keeping transactions open longer will impact the
pg_clog section of the database, but that shouldn't cause the WAL to grow
out of control. Once you get this back into production, you certainly
should carefully monitor how much disk space is being taken up on the WAL
disk moving forward. If the count of them there gets back over 201 again,
that's an indication there's something that shouldn't be happening going
on. That's the sort of thing you can get more information about
This isn't a real popular time of day for this list, some get some more
ideas might show up once our many European members start posting in
volume. As a general aside, if you ever find yourself in this position
again, where you've got an urgent database problem, something you might do
in parallel with posting here is trying the IRC channel:
http://wiki.postgresql.org/wiki/IRC2RWNames
--
* Greg Smith gsmith@xxxxxxxxxxxxx http://www.gregsmith.com Baltimore, MD
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general