Re: [PATCH 1/2] libext2fs: add metadata checksum and snapshot feature flags

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Sep 15, 2011 at 06:09:04PM -0600, Andreas Dilger wrote:
> On 2011-09-15, at 5:41 PM, Ted Ts'o wrote:
> > On Thu, Sep 15, 2011 at 05:34:41PM -0600, Andreas Dilger wrote:
> >> 
> >> Darrick and I discussed zeroing the checksum fields, but then there is a
> >> race with other threads accessing the same structure.
> > 
> > What race are you worried about?  The moment you modify some part of
> > the data structure, the checksum is going to be wrong.  This is true
> > whether you zero out the checksum field before you do the calculations
> > or not.
> 
> If you are reading the structure (not modifying it) and trying to verify
> the checksum, it would be necessary to read-lock the structure, zero
> the field, compute the checksum, reset the field, unlock, and then
> compare checksums.  Alternately, one would have to make a copy of the
> struct to zero out the field and compute the checksum on the copy.  Both
> are more complex than just doing the checksum on two separate chunks.

I think a lot of the metadata update code already has locking to prevent
multiple writers.  I'm not as sure about the locking around the read calls,
though I suspect that we take some of the same locks before starting the read.
However, I've yet to audit all those code paths.  Point is, we might already be
doing most of the necessary locking.  Too bad I'm not sure.
> 
> >> If we went to a crc32c LSB for filesystems with RO_COMPAT_CSUM it would
> >> be possible to change how it is computed.  Since we have freedom to move
> >> the checksum field now, why have the added complexity to do zeroing of
> >> the field or two chunks?
> > 
> > Why is zero'ing out the field complex?  It's a single line of code....
> > It's certainly easier than doing it in two chunks, and there will be
> > some data structures (the block group descriptors at the very least)
> > where zero'ing the checksum is definitely going to be the easier way
> > to go.
> 
> No, because for group descriptors, the size is conditional on whether
> RO_COMPAT_GDT_CSUM is set, so it is just as easy to always compute the
> first 32-byte crc32 (excluding the bg_checksum field) and then only
> conditionally compute the second 32-byte checksum.  The same is true
> of the inode checksum and s_inode_size, if the checksum is at the last
> field of struct ext2_inode.

Um.... it looks to me like the size is conditional on
EXT4_FEATURE_INCOMPAT_64BIT being set and then sb.s_desc_size.  Are you
suggesting that we make the group descriptor checksum a full 32 bits and just
stuff the upper 16 bits somewhere near the end of the structure?  I suppose
that would leave us with very little space in the group descriptor.  On the
other hand the 32-bit format is totally full already and any size increase
depends on 64bit, which means we can make it even bigger with little pain if we
need to.

Incidentally, I have cached versions of the ext4 disk layout and metadata
checksumming wiki pages mirrored here:
http://djwong.org/docs/ext4_metadata_checksums.html
http://djwong.org/docs/ext4_disk_layout.pdf

--D

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux