Re: Background fsck

Ireneusz Pluta <ipluta@xxxxx> · Fri, 08 Apr 2011 11:34:53 +0200

Greg Smith wrote:
The soft update code used in FreeBSD makes sure that there's no damage to the filesystem that 
PostgreSQL can't recover from.  Once the WAL is replayed after a crash, the database is 
consistent.  The main purpose of the background fsck is to find "orphaned" space, things that the 
filesystem incorrectly remembers the state of in regards to whether it was allocated and used.  In 
theory, there's no reason that can't happen in the background, concurrent with normal database 
activity.

In practice, background fsck is such an infrequently used piece of code that it's developed a bit 
of a reputation for being buggier than average.  It's really hard to test it, filesystem code is 
complicated, and the sort of inconsistent data you get after a hard crash is often really 
surprising.  I wouldn't be too concerned about the database integrity, but there is a small risk 
that background fsck will run into something unexpected and panic.  And that's a problem you're 
much less likely to hit using the more stable regular fsck code; thus the recommendations by some 
to avoid it.

Thank you all for your responses.

Greg, given your opinion, and these few raised issues found on the net, I think I better stay with 
background fsck disabled.

What I was primarily concerned about, was long time waiting in front of console, looking at lazy 
fsck messages and nervously confirming that disk LEDs are still blinking. It's even harder with 
remote KVM, where LED's view is not available. But my personal comfort is not a priority, anyway, so 
I let foreground fsck doing its job for as much time as it needs.

As I said in my another response, the problem initially comes from the machine hanging and having to 
be manually power cycled. There is already a significant downtinme before the recycle has a chance 
to happen. So yet another fourty minutes of fsck does not matter too much from the point of view of 
service availability.

fsck runtime duration could be shortened if I used smaller inode density for the filesystem. I think 
that makes much sense for a filesystem fully decicated to a postgres data cluster, specifically if I 
have not so many but large tables, which I rather do.

The system in question has:

df -hi | grep -E 'base|ifree'
Filesystem     Size    Used   Avail Capacity iused     ifree %iused  Mounted on
/dev/da1p3     3.0T    1.7T    1.0T    63%    485k      392M    0%   /pg/base
(will I ever have even tens of millions of tables?)

I reserved less inodes in a newer, bigger system:
Filesystem            Size    Used   Avail Capacity iused ifree %iused  Mounted on
/dev/mfid0p8           12T    4.8T    6.0T    45%    217k   49M    0%   /pg/base

or even less in yet newer one:
Filesystem            Size    Used   Avail Capacity iused ifree %iused  Mounted on
/dev/mfid0p1           12T    3.6T    7.4T    33%    202k  3.4M    6%   /pg/base
(ups, maybe too aggressive here?)

When I forced a power drop on these two other systems, to check how they survive, fsck duration on 
them was substantially less.

In the inode density context, let me ask you yet another question. Does tuning it in this way have 
any other, good or bad, significant impact on system performance?

Irek.

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance