On Mar 19, 2007 17:15 -0400, ahlist wrote: > Quite often we'll have a server that either needs a really long fsck > (10 hours - 200 gig drive) or an fsck that evntually results in > everything going to lost+found (pretty much a total loss). Strange. We get 1TB/hr fscks these days unless the filesystem is completely corrupted and has a lot of duplicate blocks. > Would rebooting these servers monthly (or some other frequency) stop this? What else is important is that if you do an fsck you run with "-f" to actually check the filesystem instead of just the superblock. e2fsck will only do a full e2fsck if the kernel detected disk corruption, OR if the "last checked" time is > 6 months or {20 < X < 40} mounts have happened since the last check time. See tune2fs(8) for details. > Is it correct to visualize this as small errors compounding over time > thus more frequent reboots would allow quick fsck's to fix the errors > before they become huge? That is definitely true. If the bitmaps get corrupted, then this will spread corruption throughout the filesystem. > (OS is redhat 7.3 and el3) I would instead suggest updating to a newer kernel (e.g. RHEL4 2.6.9) as this has fixed a LOT of bugs in ext3. Also, make sure you are using the newest e2fsck available, as some bugs have been fixed there also. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. _______________________________________________ Ext3-users mailing list Ext3-users@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/ext3-users