Greg Smith wrote:
The soft update code used in FreeBSD makes sure that there's no damage to the filesystem that
PostgreSQL can't recover from. Once the WAL is replayed after a crash, the database is
consistent. The main purpose of the background fsck is to find "orphaned" space, things that the
filesystem incorrectly remembers the state of in regards to whether it was allocated and used. In
theory, there's no reason that can't happen in the background, concurrent with normal database
activity.
In practice, background fsck is such an infrequently used piece of code that it's developed a bit
of a reputation for being buggier than average. It's really hard to test it, filesystem code is
complicated, and the sort of inconsistent data you get after a hard crash is often really
surprising. I wouldn't be too concerned about the database integrity, but there is a small risk
that background fsck will run into something unexpected and panic. And that's a problem you're
much less likely to hit using the more stable regular fsck code; thus the recommendations by some
to avoid it.
Thank you all for your responses.
Greg, given your opinion, and these few raised issues found on the net, I think I better stay with
background fsck disabled.
What I was primarily concerned about, was long time waiting in front of console, looking at lazy
fsck messages and nervously confirming that disk LEDs are still blinking. It's even harder with
remote KVM, where LED's view is not available. But my personal comfort is not a priority, anyway, so
I let foreground fsck doing its job for as much time as it needs.
As I said in my another response, the problem initially comes from the machine hanging and having to
be manually power cycled. There is already a significant downtinme before the recycle has a chance
to happen. So yet another fourty minutes of fsck does not matter too much from the point of view of
service availability.
fsck runtime duration could be shortened if I used smaller inode density for the filesystem. I think
that makes much sense for a filesystem fully decicated to a postgres data cluster, specifically if I
have not so many but large tables, which I rather do.
The system in question has:
df -hi | grep -E 'base|ifree'
Filesystem Size Used Avail Capacity iused ifree %iused Mounted on
/dev/da1p3 3.0T 1.7T 1.0T 63% 485k 392M 0% /pg/base
(will I ever have even tens of millions of tables?)
I reserved less inodes in a newer, bigger system:
Filesystem Size Used Avail Capacity iused ifree %iused Mounted on
/dev/mfid0p8 12T 4.8T 6.0T 45% 217k 49M 0% /pg/base
or even less in yet newer one:
Filesystem Size Used Avail Capacity iused ifree %iused Mounted on
/dev/mfid0p1 12T 3.6T 7.4T 33% 202k 3.4M 6% /pg/base
(ups, maybe too aggressive here?)
When I forced a power drop on these two other systems, to check how they survive, fsck duration on
them was substantially less.
In the inode density context, let me ask you yet another question. Does tuning it in this way have
any other, good or bad, significant impact on system performance?
Irek.
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance