Re: [linus:master] [kbuild] 3bc753c06d: xfstests.generic.454.fail

"Jason A. Donenfeld" <Jason@xxxxxxxxx> · Fri, 30 Dec 2022 03:08:14 +0100

On Thu, Dec 29, 2022 at 10:55:05AM -0800, Linus Torvalds wrote:
> Also, I'm surprised this hasn't been an issue earlier - 'char' has
> always been unsigned on arm (among other architectures), so if this
> test started failing now on x86-64 due to -funsigned-char, it has
> presumably been failing on arm the whole time.

That's the curious part, indeed...

> Oh, I think I see one potential problem in ext4:
> 
> ext4_xattr_hash_entry() is hot garbage. Lookie here:
> 
>         while (name_len--) {
>                 hash = (hash << NAME_HASH_SHIFT) ^
>                        (hash >> (8*sizeof(hash) - NAME_HASH_SHIFT)) ^
>                        *name++;
>         }
> 
> so that hash will now depend on the sign of that 'char *name' pointer.
> 
> If that hash ever has any long-term meaning (ie saved on disk or
> exposed some other way), that would be problematic.

Note that ext4 has lots of sign-specific code for hashing. Only some of
it can now be removed, since compatibility with old file systems must be
preserved. But what I mean is the code that begins in super.c:

                i = le32_to_cpu(es->s_flags);
                if (i & EXT2_FLAGS_UNSIGNED_HASH)
                        sbi->s_hash_unsigned = 3;
                else if ((i & EXT2_FLAGS_SIGNED_HASH) == 0) {
#ifdef __CHAR_UNSIGNED__
                        if (!sb_rdonly(sb))
                                es->s_flags |=
                                        cpu_to_le32(EXT2_FLAGS_UNSIGNED_HASH);
                        sbi->s_hash_unsigned = 3;
#else
                        if (!sb_rdonly(sb))
                                es->s_flags |=
                                        cpu_to_le32(EXT2_FLAGS_SIGNED_HASH);
#endif
                }

The second part of that #else can now go away. And then maybe the whole
expression can be simplified.

These actually wind up being used in namei.c:

                hinfo->hash_version += EXT4_SB(dir->i_sb)->s_hash_unsigned;

, which then sets the hash version that's selected in hash.c:

        switch (hinfo->hash_version) {
        case DX_HASH_LEGACY_UNSIGNED:
                hash = dx_hack_hash_unsigned(name, len);
                break;
        case DX_HASH_LEGACY:
                hash = dx_hack_hash_signed(name, len);
                break;
        case DX_HASH_HALF_MD4_UNSIGNED:
                str2hashbuf = str2hashbuf_unsigned;
                fallthrough;
        case DX_HASH_HALF_MD4:
                p = name;
		[...]

And so on. dx_hack_hash_unsigned() and dx_hack_hash_signed() are the
same functions, except one uses `unsigned char` and the other uses
`signed char`. It's unfortunate these exist, but now it's part of the
on-disk format, so they have to stick around (along with other warts
like "halfmd4").

But at least for new file systems, things should be unified. Anyway, it
looks like for *these* hashes, the ext4 developers did consider the
signedness issue.

Sounds like maybe it was left out of ext4_xattr_hash_entry(), which does
indeed look like it's part of the on-disk representation:

static int
ext4_xattr_inode_verify_hashes(struct inode *ea_inode,
                               struct ext4_xattr_entry *entry, void *buffer,
                               size_t size)
{
        u32 hash;

        /* Verify stored hash matches calculated hash. */
        hash = ext4_xattr_inode_hash(EXT4_SB(ea_inode->i_sb), buffer, size);
        if (hash != ext4_xattr_inode_get_hash(ea_inode))
                return -EFSCORRUPTED;

        if (entry) {
                __le32 e_hash, tmp_data;

                /* Verify entry hash. */
                tmp_data = cpu_to_le32(hash);
                e_hash = ext4_xattr_hash_entry(entry->e_name, entry->e_name_len,
                                               &tmp_data, 1);
                if (e_hash != entry->e_hash)
                        return -EFSCORRUPTED;
        }
        return 0;
}

So if ext4_xattr_hash_entry() is indeed broken with respect to
heisensignness, then this stuff has always been broken, and it's
probably good that this is being unearthed...

Jason