Re: Detecting default signedness of char in ext4 (despite -funsigned-char)

"Theodore Ts'o" <tytso@xxxxxxx> · Wed, 18 Jan 2023 15:18:16 -0500

On Tue, Jan 17, 2023 at 09:14:12PM -0800, Eric Biggers wrote:
> 
> Well, reading the code more carefully, the on-disk ext4 superblock can contain
> EXT2_FLAGS_SIGNED_HASH, EXT2_FLAGS_UNSIGNED_HASH, or neither.  "Neither" is the
> legacy case.  The above existing code in ext4 is handling the "neither" case by
> setting the flag corresponding to the default signedness of char.  So yes, that
> migration code was always broken if you moved the disk from a platform with
> signed char (e.g. x86) to a platform with unsigned char (e.g. arm).  But,
> -funsigned-char breaks it whenever the disk stays on a platform with signed
> char.  That seems much worse.  Though, it's also a migration for legacy
> filesystems, so maybe that code isn't needed often anymore anyway...

The migration code will detect what the current signedness of char,
and then *set* the SIGNED/UNSIGNED flag.  Once we've set the flag,
we'll continue to use it, and modern versions of e2fsprogs will set
the SIGNED/UNSIGNED flag again using the default signedness of char.
Once the signed/unsigned flag is set, then we can freely move the file
system between architectures and it's not a problem.  So if you have a
file system that was created before 2007 (e2fsprogs v1.40), the first
time you mount that file system on a kernel or run e2fsck on a scheme
that understands the new SIGNED/UNSIGNED scheme, then we use whatever
hash variant that had been used on the pre-2007 kernel/e2fsprogs
forever.

commit f77704e416fca7dbe4cc91abba674d2ae3c14f6f
Author: Theodore Ts'o <tytso@xxxxxxx>
Date:   Sat Nov 11 22:32:35 2006 -0500

    Add directory hashed signed/unsigned hint to superblock

    The e2fsprogs and kernel implementation of directory hash tree has a
    bug which causes the implementation to be dependent on whether
    characters are signed or unsigned.  Platforms such as the PowerPC,
    Arm, and S/390 have signed characters by default, which means that
    hash directories on those systems are incompatible with hash
    directories on other systems, such as the x86.

    To fix this we add a new flags field to the superblock, and define two
    new bits in that field to indicate whether or not the directory should
    be signed or unsigned.  If the bits are not set, e2fsck and fixed
    kernels will set them to the signed/unsigned value of the currently
    running platform, and then respect those bits when calculating the
    directory hash.  This allows compatibility with current filesystems,
    as well as allowing cross-architectural compatibility.

    Addresses Debian Bug: #389772

    Signed-off-by: "Theodore Ts'o" <tytso@xxxxxxx>

So techncially speaking the migration code is probably not needed any
more, but it's not a lot of code (just a few lines of code to set the
flags if neither flag is set), and it's always possible that that you
might try to mount a pre-2006 file system on a 2022 system, so we
might as well keep it.

> Anyway, as I found out after my initial reply, and which you noticed too, the
> thing that actually caused the breakage here is the xattr hash.  The xattr hash
> isn't related to the logic in __ext4_fill_super(), or to EXT2_FLAGS_SIGNED_HASH
> and EXT2_FLAGS_UNSIGNED_HASH.  Those are only for the dirhash.  Unfortunately,
> ext4's xattr hash didn't get fixed years ago when the dirhash did.  The xattr
> hash still just always uses 'char'.

I *think* we're OK, because we only use XOR (^) on the char in the
xattr entry hash:

static __le32 ext4_xattr_hash_entry(char *name, size_t name_len, __le32 *value,
				    size_t value_count)
{
	__u32 hash = 0;

	while (name_len--) {
		hash = (hash << NAME_HASH_SHIFT) ^
		       (hash >> (8*sizeof(hash) - NAME_HASH_SHIFT)) ^
		       *name++;
	}
	while (value_count--) {
		hash = (hash << VALUE_HASH_SHIFT) ^
		       (hash >> (8*sizeof(hash) - VALUE_HASH_SHIFT)) ^
		       le32_to_cpu(*value++);
	}
	return cpu_to_le32(hash);
}

And XOR shouldn't matter whether *name is signed or unsigned.  Am I
missing something?

						- Ted