From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Bob Peterson Sent: den 26 september 2007 16:01 To: linux clustering Subject: Re: Found unlinked inode > Hi Jonas, > > Well, I can think of one possible explanation. I can't be sure because > I don't know your test scenario, but this is my theory. First, a bit > of background: > > When a node gets "shot" as you say, the metadata for some of its > recent operations are likely to still be in the journal it was using. > Depending on the circumstances of when it gets shot, that data may > exist only in the journal if it got "shot" before the data was > written to its final destination on disk. > > Ordinarily, that's not a big deal because the next time the file > system is mounted, the journal is replayed and that causes the metadata > to be written correctly to its proper place on disk and all is well. > That's the same for most journaling file systems afaik. > > A couple years ago, one of my predecessors (before I started) made an > executive decision to make gfs_fsck *clear* the system journals rather > than replay the journals. I don't know offhand if the code was once > there and got taken out or if it was never written. At any rate, it > seemed like a good idea at the time and there were several good > reasons to justify that decision: > > First, if the user is running gfs_fsck, they must already suspect > file system corruption. If (and this is a big if) that corruption was > caused by recent operations to the file system, then replaying the > journal can only serve to compound the corruption and cause more > corruption. That's because what is in the journal may also be based on > the corruption. This was more of a concern if, for some reason, > GFS bailed out and "withdrew" from the file system because it detected > corruption, suspecting that it must have somehow caused that corruption. > > Second, if the user is running gfs_fsck because of corruption, we may > not be able to assume that the journal is good metadata, worthy of > being replayed. > > Third, the user always has the option of replaying the journal before > doing gfs_fsck: > > 1. mount the file system after the crash (to replay the journal) > 2. unmount the file system > 3. run gfs_fsck > > The decision to have gfs_fsck clear the journals was probably made > many years ago, before gfs was stable, and these "withdraw" situations > were more common. > > Some people believe that this was a bad decision. I believe > that it makes more sense to trust the journal and replay it before > doing the rest of the fsck operations because in "normal" cases where > a node dies (often for some reason unrelated to gfs, like getting > shot, fenced, losing power, blowing up a power supply, etc.) you have > the potential to lose metadata unless the journal is replayed. > > Other journaling file systems replay their journals during fsck > or else they inform the user, ask them to take steps to replay > the journal (as above), or give them the option to clear them, etc. > So far, gfs_fsck does not do that. It just clears the journals. > > To remedy the situation, I've got an open bugzilla 291551 (which > may be marked "private" because it was opened internally--sorry) > at least in the gfs2_fsck case. (gfs_fsck will likely be done too). > With that bugzilla, I intend to somehow remedy the situation. > Either I'll ask the user if they want the journal replayed or else > I'll replay them automatically, or try to detect problems with them. > > I'm not certain that this is the cause of your corruption, but it's > the only one I can think of at the moment. > Hi Bob, This sounds like a reasonable explanation except for one thing, the filesystem was cleanly umounted on both nodes before I ran gfs_fsck. So there shouldn't be any journal to replay, right? Anyway, I've restarted the test and if I'm able to recreate this error I'll first take a copy of the filesystem and then check if running "mount + umount" makes this gfs_fsck error go away. Regards, Jonas -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster