On 10/24/2010 03:08 AM, Ted Ts'o wrote: > On Sun, Oct 24, 2010 at 02:20:45AM +0200, Bernd Schubert wrote: >> Hmm, maybe we have a mis-understanding here. If we could make e2fsck >> to *only* recovery the journal, that would be perfect. Kernel and >> e2fsck journal recovery should take approximately the same time. But >> that option does not exist yet (well, a half baken patch is on my >> disk now). If e2fsck then would detect as the kernel: >> "clear_journal_err: Filesystem error recorded from previous mount" >> and mark the filesystem with an error, that would be all we need to >> then abort the mount in the pacemaker script and allow us to run a >> real e2fsck outside of pacemaker. > > What probably makes sense is to have an extended option which causes > e2fsck to just run the journal and then exit. Part of running the > journal should be setting the EXT4_ERROR_FS bit in s_mount_state and > then clearning the journal. That seems to be missing entirely from > e2fsck, which is a bug that we should fix regardless. Adding the journal option is simple, I will provide a patch by Wednesday or Thursday. Will also check if it sets EXT2_ERROR_FS and if not, will try to find some time to add that. > > As far as detecting whether or not the file system has known errors, > you can do that by using dumpe2fs -h and grepping for "Filesystem > state". That can have the values "clean" or "with errors". (For ext2 > file systems, or ext4 file systems without a journal, you can also > have the state "not clean" and "not clean with errors", but if you > have a journal the latter two states shouldn't ever come up.) I added exactly that to our lustre_server pacemaker agent last week :) And when I noticed it still mounts filesystems with errors, I started this thread here. > > That way the logic that you want is something you can build into your > script, and we don't need to embed application specific logic into > e2fsprogs. The ability to just run the journal without doing any > further checking seems like a reasonable thing to add to e2fsck --- > and by using dumpe2fs -h you'll be able to detect all possible file > system errors (not just the ones which are reported via the journal > error system). > > Does that sound reasonable to you? Yes, we perfectly agree on each other now :) Thanks, Bernd
Attachment:
signature.asc
Description: OpenPGP digital signature