On Apr 25, 2002 09:14 -0400, Darrell Michaud wrote: > There are three data partitions on each drive, each of which are > mirrored. (/boot, /, and /home). Over time, depending on overall disk > use and NOT on use on a particular filesystem, the / filesystem becomes > corrupt. Strangely enough, I can run bonnie, dd tests, copies, etc all > day on the /home and /boot ext3 filesystem and they have never become > corrupt- only the / partition does. > > I have a lot of data points for this behavior.. 8 of these machines, all > identical in configuration, exhibit the same symptoms. Hmm, that does seem ominous. Did you ever notice if your corruption is actually _related_ to the use of e2fsck on the root partition and/or crashes, or does it get corrupted even after normal usage? Have you checked the l-k archives for any possible DMA/IDE problems on your chipset? Have you tried booting with "ide=nodma" as a kernel option to see if that helps? > What's aggravating my problem is that for some reason the root > filesystem is only fsck'd on boot when a power-off event occurs. Well, e2fsck _should_ run all the time, but it will normally report a clean filesystem and continue. If there was a crash it will normally report something like "journal recovered" and then clean filesystem. It may be that your startup scripts are too "nice" and hide the output from fsck for you. > If I manually set needs_** with hdparm it is ignored (or possibly reset > upon a clean shutdown). I posted these symptoms last month in hopes that > someone had seem them before. I got some hints to check my /etc/fstab > file to make sure / gets fsck'd, but that was ok. hdparm? You can tell e2fsck to run a full fsck on each boot in several ways: 1) create a /forcefsck file (you may have to do this on each boot) 2) use "tune2fs -c 1 <dev>" to force an fsck each mount 3) create a file /fsckoptions with "-f" in it 4) create a file /etc/sysconfig/autofsck with "AUTOFSCK_DEF_CHECK=yes" in it The #3 and #4 options may be RedHat specific. > > md1 : active raid5 ide/host2/bus1/target0/lun0/part1[2] Both of you are using MD RAID. Is there a possibility to disable MD raid on the root device and see if this fixes things? This is obviously a lot easier to do on the mirrored root filesystem (change back to using one of the raw devices instead of the MD device, and disable MD for that device, including RAID autostart where you need to change the partition type). Cheers, Andreas -- Andreas Dilger http://www-mddsp.enel.ucalgary.ca/People/adilger/ http://sourceforge.net/projects/ext2resize/