On 2/16/11 4:12 PM, Andreas Dilger wrote: > On 2011-02-16, at 11:12, Eric Sandeen wrote: >> Anaconda (the Fedora/RHEL installer) had been "fixing up" extN >> filesystems it created by setting the max mount count and check >> interval to 0, as well as adding user_xattr to filesystem mount >> options. >> >> As part of their efforts to stop special-casing around upstream >> defaults, they've removed these changes upstream. >> >> However, I'd like to at least propose that these changes be made >> default. > > I'd really prefer instead that the "lvcheck" script be included into > the distro, instead of changing mke2fs. That achieves the same end > result (periodic scrubbing of the filesystem to look for hidden > errors), without introducing boot-time delays. Given the size of > disks today and the undetected bit-error-rate (somewhere around > 1/10^15 bits or 12TB), I think it is important that there be > automated scrubbing of the filesystem. lvcheck is well and good, but is not a panacea; it is useful only for snapshottable volumes.... and only lvm for now? > I think the best place to put that script would be in the lvm tools > (since it is applicable to multiple filesystems), which I think Eric > has the most leverage in getting accepted (I've been but I'd be OK > including it with e2fsprogs if there is pushback on that. device-mapper utilities ended up being a black hole... combination of "the scripts don't conform to our style" or somesuch, but no real interest in adopting & fixing them to do so, IIRC. >> The forced fsck often comes at unexpected and inopportune moments, >> and even enterprise customers are often caught by surprise when >> this happens. Because a filesystem with an error condition will be >> marked as requiring fsck anyway, > > Any decent RAID array does background scrubbing for integrity > verification, it doesn't just wait until there is an uncorrectable > error detected in the block device. If we can do something proactive > to prevent this (i.e. lvcheck run by cron.weekly), it is worthwhile. If the raid went offline for a couple hours at random times to do this, users would scream too. This is essentially what the forced fsck does today. > I think customers are equally surprised when their server fails > (remount-ro/panic) due to the kernel detecting an error that might > have been on disk for weeks or months. If I were an administrator, I would schedule fscks to avoid this, rather than rely on a "kludgy hack of using the UUID to derive a random" time for this to hit... >> I submit that the time-based and mount-based checks are not >> particularly useful, and that administrators can schedule fscks on >> their own time, or tune2fs the enforced intervals if they so >> choose. > > I think you are projecting your own self-enlightenment onto users > ;-). As we see on this list, there are many users that don't even > back up their critical data, so IMHO taking out "safe by default" > options is a step in the wrong direction. Perhaps I'll whip up a s_last_backup_time patch, and refuse to mount if the user hasn't conformed to our enlightened notions of how often is often enough, as well. I could integrate it with dumpe2fs. ;) There is "safe by default" and then there is "assuming administrator responsibilities," IMHO. I just personally think it's too much. > Attached is my latest version of the lvcheck script, and a default > /etc/lvcheck.conf script. It's been enhanced to include a usage > message, command-line option parsing to override default parameters, > and the ability to check snapshots of ext3/4 filesystems with an > external journal. > The script is great, but has limited application. Well, anyway, I knew this wouldn't be super popular with everyone, but figured I'd put it out there for discussion. -Eric > Cheers, Andreas > > > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html