Re: [RFC] jbd2 metadata checksumming

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Sep 23, 2011 at 06:07:46PM -0600, Andreas Dilger wrote:
> On 2011-09-23, at 4:51 PM, Darrick J. Wong wrote:
> > While I'm working on adding metadata checksumming to ext4, I figured that I
> > ought to look into the similar feature in jbd2.  At first I thought I'd simply
> > change the default crc algorithm to crc32c and update the field in the commit
> > block, but then it was suggested to me that I move that field into the journal
> > superblock so that during recovery we don't have to scan ahead through the
> > transaction to find the commit block so that we can learn the algorithm type.
> > 
> > Doing that seems to require a format change to the superblock to add that
> > field.  I think that adding the crc-type field to the superblock is a rocompat
> > change since we're not changing existing fields, just adding fields.  It looks
> > like the kernel and e2fsprogs code both reject a journal if they find unknown
> > rocompat bits set.  (Using a journal in ro mode is not useful.)
> 
> The question is whether the "rejected journal" means that it is ignored
> during recovery and not replayed at all, or if it prevents mounting?  If it
> is ignored and not replayed, but mount continues, that would lead to filesystem
> corruption, very bad.
> 
> If it prevents mounting, and needs an updated kernel and/or e2fsprogs to
> clear (presumably the kernel will not enable this itself unless told to
> do so by EXT4_FEATURE_INCOMPAT_METADATA_CSUM), that is not so bad, and
> will still allow downgrading to an older kernel as long as the journal is
> replayed.

rocompat/incompat feature flag mismatches both cause ext4 to error out of
ext4_fill_super.  It doesn't appear to try to replay at all.

I think e2fsck actually tries to fix it, where "fix it" seems to mean "blow it
away"... not sure though.  I'll have a closer look Monday morning.

> > I decided to dig deeper to see what exactly the journal checksum covers.  It
> > appears to me that the superblock, revocation blocks, and commit blocks are not
> > covered by a checksum.  Revocation blocks ought to be checksummed because a
> > lost write involving the second sector of a suitably large revocation block
> > could result in the wrong blocks being skipped during recovery.  It seems like
> > it would be easy to extend the current journal_checksum feature to cover the
> > commit block, and adding a checksum to the superblock seems trivial.
> > 
> > Lastly, if I'm already making change, I might as well bake the journal UUID
> > into the checksum as well.  The transaction ID is already in each metadata
> > block by virtue of the common block header.
> > 
> > So to summarize, I propose:
> > 
> > 1. Adding a JBD2_FEATURE_ROCOMPAT_CHECKSUM_V2 field, which provides:
> > 2. A u8 field at offset 0x50 in the superblock which identifies the checksum
> >   algorithm that's in use;
> > 3. A u32 field at offset 0x54 in the superblock to hold the superblock's
> >   checksum;
> 
> Why not put it at the end of the superblock, so that it can cover the whole
> thing?

There's a s_users field sitting at the end of the structure.  We probably don't
want that structure to grow beyond 1024 bytes for the sake of small-block
filesystems.  Any recommendations?

> > 4. Changing the revocation block code to put a checksum in the 4 bytes
> >   following the revocation data, and to ensure those 4 bytes always exist;
> 
> It would be easier to see the changes if you included the structs.

Too bad the ext4 wiki is still down. :(

/*
 * The journal superblock.  All fields are in big-endian byte order.
 */
typedef struct journal_superblock_s
{
/* 0x0000 */
        journal_header_t s_header;

/* 0x000C */
        /* Static information describing the journal */
        __be32  s_blocksize;            /* journal device blocksize */
        __be32  s_maxlen;               /* total blocks in journal file */
        __be32  s_first;                /* first block of log information */

/* 0x0018 */
        /* Dynamic information describing the current state of the log */
        __be32  s_sequence;             /* first commit ID expected in log */
        __be32  s_start;                /* blocknr of start of log */

/* 0x0020 */
        /* Error value, as set by jbd2_journal_abort(). */
        __be32  s_errno;

/* 0x0024 */
        /* Remaining fields are only valid in a version-2 superblock */
        __be32  s_feature_compat;       /* compatible feature set */
        __be32  s_feature_incompat;     /* incompatible feature set */
        __be32  s_feature_ro_compat;    /* readonly-compatible feature set */
/* 0x0030 */
        __u8    s_uuid[16];             /* 128-bit uuid for journal */

/* 0x0040 */
        __be32  s_nr_users;             /* Nr of filesystems sharing log */

        __be32  s_dynsuper;             /* Blocknr of dynamic superblock copy*/

/* 0x0048 */
        __be32  s_max_transaction;      /* Limit of journal blocks per trans.*/
        __be32  s_max_trans_data;       /* Limit of data blocks per trans. */

/* 0x0050 */
	__u8	s_checksum_type;	/* checksum type */
	__u8	s_char_pad[3];		/* not used */
	__be32	s_checksum;		/* crc32c(superblock) */
        __u32   s_padding[42];

/* 0x0100 */
        __u8    s_users[16*48];         /* ids of all fs'es sharing the log */
/* 0x0400 */
} journal_superblock_t;

> > 5. Adding the journal UUID to each checksum computation;
> > 6. Extend the commit checksum to cover the commit block itself, with the commit
> >   block checksum field zeroed during the computation, of course;
> > 7. Changing the default algorithm to crc32c; and
> > 8. Updating ext4 to enable both checksum fields at journal load time, if the
> >   user supplies the journal_checksum mount option.
> 
> Probably this should also be conditional on the ext4 code using the
> EXT4_FEATURE_INCOMPAT_METADATA_CSUM, so that we know the kernel will
> be able to recover, and the user has explicitly requested this.
> 
> There is a mechanism for the ext4 code to pass features to the jbd2 code
> already, so this shouldn't be a problem.

So you're saying that if metadata_csum is set in ext4 and the user mounts with
journal_checksum, then set the JBD checksum v2 flag?  Probably makes sense.

How about ocfs?

--D
> 
> 
> Cheers, Andreas
> 
> 
> 
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux