On 3/29/11 5:26 PM, Daniel Taylor wrote: > Thanks for the suggestions. Tao Ma's got me started, but doing some > of the more "devious" tests is on my list, too. > > The original issue was that during component stress testing, we were > seeing instances of the ext4 file system becoming "read-only" (showing > in /proc/mounts, but not "mount"). Looking back through the logs, we > saw that at mount time, there was a complaint about a corrupted journal. So, did it go "read-only" right at mount time due to a journal replay failure? Or ... > Some writing had occurred before the change to read-only, however. That makes it sound like it did get mounted ok... and then something went wrong? What did the logs say? > The original mount script didn't check for any "mount" return value, so > we theorized that ext4 just got to a point where it couldn't sensibly > handle any more changes. I'm not sure what that means, TBH :) Just want to make sure you're barking up the right tree, here ... -Eric > It seemed that the right answer was to check the return value from mount > and, if non-0, umount the file system, fix it, and try again. To test > the return value from mount, I need to be able to corrupt, but not > destroy the journal, since the component tests were taking days to show > the failure. > > Running an "fsck -f" every time on a 3TB file system with an embedded > PPC was just taking too much time to impose on a consumer-level customer. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html