On Sat, Apr 28, 2012 at 10:19:33AM -0400, Ted Ts'o wrote: > On Tue, Mar 06, 2012 at 12:49:41PM -0800, Darrick J. Wong wrote: > > @@ -177,11 +189,17 @@ typedef struct journal_block_tag_s > > __be32 t_blocknr; /* The on-disk block number */ > > __be32 t_flags; /* See below */ > > __be32 t_blocknr_high; /* most-significant high 32bits. */ > > + __be32 t_checksum; /* crc32c(uuid+seq+block) */ > > } journal_block_tag_t; > > > > #define JBD2_TAG_SIZE32 (offsetof(journal_block_tag_t, t_blocknr_high)) > > #define JBD2_TAG_SIZE64 (sizeof(journal_block_tag_t)) > > There's a problem with this patch here --- we are changing the size of > journal_block_tag_t, which is an on-disk data structure. So for > 64-bit journals, this represents a format change. This means that if > you have a 64-bit file system that needs to have its journal > recovered, if the journal was written with an older kernel, and then > we try to recover it with a new kernel, things won't be good. > Similarly, for e2fsck's recovery code, it's not going to be able to > recover 64-bit file systems using current coding, since this patch > series changes the size of JBD2_TAG_SIZE64. > > What we need to do is something like this: > > #define JBD2_TAG_SIZE64 (offsetof(journal_block_tag_t, t_checksum)) > #define JBD2_TAG_SIZE_CSUM (sizeof(journal_block_tag_t)) > > And then change the code appropriately in e2fsprogs and in the kernel > to use the correct tag size depending on the journal options. Oops. I forgot to update JBD2_TAG_SIZE64. I have a question, though -- it looks as though the code that handles reading and writing tags from raw disk blocks calls journal_tag_bytes() to determine the tag size, and manually increments a pointer "tagp" to step through the block. This construction seems to be be sufficient to deal with possible differences between sizeof(journal_block_tag_t) and the on-disk tag size, and both increases over the 32bit tag size are gated on INCOMPAT_64BIT and INCOMPAT_CSUM_V2. Had I defined JBD2_TAG_SIZE64 with offsetof() as Ted did above, I think that journal_tag_bytes() would return the correct on-disk tag size, which should fix the scenario Ted outlined above. The tag checksum set/verify functions would also need to be taught where t_checksum is (in the space occupied by t_blocknr_high) on a 32bit journal. Could those two suggestions fix the problem without causing us to discard half the checksum bits? Well, not quite -- the calculation of tags per block in journal.c below the comment "journal descriptor can store up to n blocks -bzzz" probably ought to be using journal_tag_bytes(), not sizeof(journal_block_tag_t) to figure out how many tags can be crammed into a disk block, since right now I think it underreports the number of tags per block on a 32bit journal. journal_tag_disk_size() is a more descriptive name for journal_tag_bytes(). As for putting half the checksum into the upper 16 bits of the flags field -- is journal space at such a premium that we need to overload the field and reduce the strength of the checksum? Enabling journal checksums on a 4k block filesystem causes tags_per_block to decrease from 512 to 341 on a 32bit journal and from 341 to 256 on a 64bit journal. Do transactions typically have that many blocks? I didn't think most transactions had 1-2MB of dirty data. --D > > - Ted > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html