Re: ext2/4 large inode xattr mismash?

Andreas Dilger <adilger@xxxxxxxxx> · Mon, 27 Feb 2017 15:42:11 -0700

On Feb 27, 2017, at 11:10 AM, Darrick J. Wong <darrick.wong@xxxxxxxxxx> wrote:
> 
> Hi Ted,
> 
> Today ngkaho1234 pointed out on IRC that if you do the following:
> 
> $ mkfs.ext2 /dev/sda -F
> $ mount /dev/sda /mnt -t ext4
> $ touch /mnt/x
> $ setfattr -n user.name1 -v moo /mnt/x
> $ umount /mnt
> $ mount /dev/sda /mnt -t ext2
> $ setfattr -n user.name1 -v urk /mnt/x
> $ umount /mnt
> 
> Then you get this broken looking result:
> $ mount /dev/sda /mnt -e ext4
> $ getfattr /mnt/x
> getfattr: Removing leading '/' from absolute path names
> # file: mnt/x
> user.name1
> user.name1
> 
> Looking through debugfs, it seems that ext4 wrote user.name1=moo as an
> ibody xattr, then ext2 wrote user.name1=urk into an external xattr block
> and set i_file_acl.  ext4 sees the attr it wrote, ext2 sees the attr it
> wrote, and neither can see the attr the other wrote.
> 
> $ debugfs -R 'features' /dev/sda
> Filesystem features: ext_attr resize_inode dir_index filetype sparse_super large_file
> 
> $ debugfs -R 'stat /x' /dev/sda
> Inode: 12   Type: regular    Mode:  0644   Flags: 0x0
> Generation: 2341792653    Version: 0x00000000:00000001
> User:     0   Group:     0   Project:     0   Size: 0
> File ACL: 617    Directory ACL: 0
> Links: 1   Blockcount: 8
> Fragment:  Address: 0    Number: 0    Size: 0
> ctime: 0x58b45e2b:926e39a8 -- Mon Feb 27 09:13:15 2017
> atime: 0x58b45e21:c4f666c0 -- Mon Feb 27 09:13:05 2017
> mtime: 0x58b45e21:c4f666c0 -- Mon Feb 27 09:13:05 2017
> crtime: 0x58b45e21:c4f666c0 -- Mon Feb 27 09:13:05 2017
> Size of extra inode fields: 32
> Extended attributes:
>  user.name1 (3) = "moo"
>  user.name1 (3) = "urk"
> BLOCKS:
> 
> This is wrong -- we don't have RO_COMPAT_EXTRA_ISIZE set, so ext4 should
> /not/ be setting i_extra_isize=32.  Unfortunately, it does set
> i_extra_isize, which enables ext4_xattr_ibody_set to write xattrs into
> the inode body.  That's a problem, because ext2 doesn't know about
> inline attrs or i_extra_isize.

I suspect that this isn't a big problem in real life, since most systems
these days are using ext4 to mount ext2 filesystems, instead of using the
separate ext2 module, and it could understand the extra data anyway.

The thing to check/fix in the ext4 code is to not set i_extra_isize if the
EXTRA_ISIZE feature isn't set.  Also, e2fsck should set the feature flag
if it finds valid extra inode data beyond 128 bytes, since this is already
true out in the wild, so we don't want to clobber existing (meta)data in
large inodes.

Unfortunately, our policy is to not enable features in-kernel automatically,
to avoid the problem of potentially making the filesystem unmountable,
otherwise we could do the same in ext4.  That said, e2fsck will prompt the
user to fix this if needed.

> It occurred to me to check the s_inode_size, which is 256 on this
> supposedly ext2 filesystem.  I'd have thought _fill_super would check
> this value, but apparently its only criteria are that s_inode_size is at
> least 128, a power of 2, and no larger than a block.  But AFAIK ext2 has
> no ability to use any inode space beyond the first 128 bytes, so what
> good are large inodes?

I don't think the 256-byte inode size should be considered a problem on
ext2 by itself.  The old code understands the larger inode size, it just
didn't do anything with that larger size until ext3 had fast xattrs and
the added timestamp fields.  The other thought is to just get rid of the
ext2 code completely, and the problem would be gone...  I don't think
there are (m)any cases where ext2 is faster than ext4, if nojournal
is used to level the playing field.

> Oh, and e2fsck apparently doesn't notice if there are inodes with
> i_extra_isize set even if ro_compat_extra_isize is not set.
> 
> I could see a bunch of fixes to resolve this problem:
> 
> 1) Teach ext4 not to set i_extra_isize unless the feature bit is set.
> 2) ext2_iget grows the ability to return -EFSCORRUPTED for inodes that
>   have big inodes and i_extra_isize set, to encourage people to run
>   e2fsck.

No, it shouldn't be accessing that space if the feature flag isn't set,
and should return an error at mount if it is (if not read-only).  That
is the reason the EXTRA_ISIZE was created as a read-only feature, so
that ext2 can't add xattrs or change the extended timestamps that conflict
with values in the large inode.

> 3) e2fsck will move attrs and zap i_extra_isize if i_extra_isize is set
>   on a filesystem that doesn't support it.

Wouldn't it be better to set the feature flag rather than deleting the
xattrs (which may not fit into the xattr block and/or may consume the
free space of the filesystem, and at a minimum will hurt performance)?

> 4) mke2fs probably should stop allocating 256 byte inodes with the ext2
>   and ext3 profiles, though it's not clear to me why the ext2 driver
>   even allows this -- maybe there's some context here I don't know?

Well, this in itself isn't harmful, and allows those filesystems to be
updated to ext4 easily in the future.  I think the root of the problem
is that ext4 is enabling a feature without checking the feature flag.

> So ... /me isn't sure how to deal with this.  List? :)
> 
> --D

Cheers, Andreas

Attachment:
signature.asc

Description: Message signed with OpenPGP