On Thu, Aug 21, 2003 at 02:47:06PM -0600, Andreas Dilger wrote: > On Aug 21, 2003 15:28 -0400, Erez Zadok wrote: > > In message <20030821190811.GC1040@xxxxxxxxxxxxx>, Mike Fedyk writes: > > > There's no need to support it in the kernel. The inode number is kept in > > > the superblock, and that's updated at mkfs and tune2fs time, not from the > > > kernel. Actually, there is one possible reason why we might want to have kernel support for this --- and that's so that if the root filesystem is corrupted in this manner, the kernel can automatically fall back to trying to use the backup journal when it does the journal replay prior to mounting the root filesystem. > There are not, AFAICS, two copies of the journal being kept, which would > require kernel changes and cause an even larger performance hit for ext3. > > Instead, the journal inode number is being kept in all of the backup > superblocks (I don't think it was in the past). Secondly, there is a > new "backup journal inode" (also kept in the superblock + backups), > which I infer holds a duplicate of the blocks allocated to the journal. The journal inode number was kept in all of the backup superblocks if the journal was created using mke2fs and tune2fs. There was a bug in e2fsck which was fixed in the patch that I included in my previous mail message where when e2fsck moved the journal from /.journal to the hidden journal inode, it didn't write out the changed journal inode number to the backup superblocks. > Having only the inode i_blocks field duplicated in a backup inode means > that there is no (new) overhead writing to the journal, yet if the journal > inode itself gets corrupted (very possible because it shares the same disk > block with the root inode and is right at the beginning of the disk), we > have a chance to recover the journal data. As a result, the journal itself > will very likely have backups of recently-written blocks and can "self heal" > from all sorts of nasty corruptions. Correct. Actually, what's being backed up is the i_block[] array as well as the i_size field. It turns out that the i_blocks (number of blocks) field isn't needed by e2fsck, so I didn't bother backing it up. Total cost to the superblock? 64 bytes. (16 32-bit unsigned integers.) > What would also be needed (not sure if this is implemented or not) is that > in the case of a corrupt superblock e2fsck assumes "needs_recovery" is set > if "has_journal" is set and the (backup) journal inode can be read, so that > the journal replay is actually done. That will almost always result in the > primary superblock being restored from somewhere in the journal, along with > other useful things like bitmaps and such. Ooh, good point. Yeah, I definitely need to do that, since if the primary superblock is trashed, the needs_recovery flag won't be set in the backup superblocks. I need to think a bit to make sure there won't be any potential lossage cases caused by attempting to replay a journal when it's not necessary, but I don't think there are any. - Ted _______________________________________________ Ext3-users@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/ext3-users