On Tue, 4 Mar 2008 20:01:09 +0100 Jan Kara <jack@xxxxxxx> wrote: > Hi, > > On Tue 04-03-08 13:39:41, Josef Bacik wrote: > > jbd and I want a way to verify that I'm not screwing anything up in the > > process, and this is what I came up with. Basically this option would only be > > used in the case where someone mounts an ext3 image or fs, does a specific IO > > operation (create 100 files, write data to a few files etc), unmounts the fs > > and remounts so that jbd does its journal recovery and then check the status of > > the fs to make sure its exactly the way its expected to be. I'm not entirely > > sure how usefull of an option like this would be (or if I did it right :) ), > > but I thought I'd throw it out there in case anybody thinks it may be useful, > > and in case there is some case that I'm missing so I can fix it and better make > > sure I don't mess anything up while doing stuff. Basically this patch keeps us > > from resetting the journal's tail/transaction sequence when we destroy the > > journal so when we mount the fs again it will look like we didn't unmount > > properly and recovery will occur. Any comments are much appreciated, > Actually, there is a different way how we've done checking like this (and > I think also more useful), at least for ext3. Basically you mounted a > filesysteem with some timeout and after the timeout, device was forced > read-only. And then you've checked that the fs is consistent after journal > replay. I think Andrew had the patches somewhere... About a billion years ago... But the idea was (I think) good: - mount the filesystem with `-o ro_after=100' - the fs arms a timer to go off in 100 seconds - now you start running some filesystem stress test - the timer goes off. At timer-interrupt time, flags are set which cause the low-level driver layer to start silently ignoring all writes to the device which backs the filesystem. This simulates a crash or poweroff. - Now up in userspace we - kill off the stresstest - unmount the fs - mount the fs (to run recovery) - unmount the fs - fsck it - mount the fs - check the data content of the files which the stresstest was writing: look for uninitialised blocks, incorrect data, etc. - unmount the fs - start it all again. So it's 100% scriptable and can be left running overnight, etc. It found quite a few problems with ext3/jbd recovery which I doubt could be found by other means. This was 6-7 years ago and I'd expect that new recovery bugs have crept in since then which it can expose. I think we should implement this in a formal, mergeable fashion, as there are numerous filesystems which could and should use this sort of testing infrastructure. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html