On Apr 25, 2002 14:45 -0400, Darrell Michaud wrote: > I did not spend too much time dumbing down the DMA, mostly because the > consequences of that, even if it worked, would be unacceptable. But at least it would tell you (us) if that is the source of the corruption, and then you could start looking for a fix for the DMA problem. If you _do_ have the opportunity to test this, please let us know. In most cases, you probably won't notice the change in performance between DMA and non-DMA anyways. It is only really noticable with very large files (sustained throughput and increased CPU usage), because I/O time is dominated by seeking anyways. > I did try using a "normal" partition instead of the software raid, > however, and that has resolved the corruption completely. I'm still > hoping that there's a workaround that would allow the use of software > raid with these drives on this chipset (i860) with ext3, but for the > time being a non-raid setup is ok. Hmm, that is the first time that I have heard it reported that MD RAID is a source of problems on 2.4 kernels. It is true that on 2.2 kernels you should not mix MD RAID and ext3 (or reiserfs), but I _thought_ it was OK on 2.4. Maybe a bug has crept in somewhere. > I don't *think* that the corruption was related to the use of fsck, > because I only started checking the other 7 systems after extensively > testing one. There's always the chance that it could be a cruel truth, > however :P Well, the reason fsck might cause problems is if you run it on the root filesystem (which is mounted at the time fsck is run) and for some reason there is a discrepency between what the kernel sees, and what e2fsck reads/writes to the device (i.e. cache coherency issues). Over time you would get pages dropped from RAM and re-read from disk, and what e2fsck put on the disk is not what the kernel was expecting. Again, this is just a possible cause of this problem... > Yes, you're right. e2fsck does run all the time and "recover the > journal" without mention of any problems. However, if I boot from a > different raid/ext3-aware boot/root source and perform a full, manual > fsck on the root filesystem it will tell me that there are errors that > need to be repaired. The amount and severity of the errors grow with > time, but are not detected by the "journal" fsck. Well a "journal" fsck is not an fsck at all, really. It's just e2fsck seeing that the journal has data, writing the data to the filesystem, and then checking the superblock and finding it marked "clean" (as it always is with ext3, unless there is an error. > > Both of you are using MD RAID. Is there a possibility to disable MD > > raid on the root device and see if this fixes things? This is obviously > > a lot easier to do on the mirrored root filesystem (change back to using > > one of the raw devices instead of the MD device, and disable MD for that > > device, including RAID autostart where you need to change the partition > > type). > > This was going to be my next troubleshooting step.. Wish me luck :P Well, you just said above that using a "normal" partition resolves the problem completely... Cheers, Andreas -- Andreas Dilger http://www-mddsp.enel.ucalgary.ca/People/adilger/ http://sourceforge.net/projects/ext2resize/