On Thu, Dec 29, 2022 at 10:55:05AM -0800, Linus Torvalds wrote: > Also, I'm surprised this hasn't been an issue earlier - 'char' has > always been unsigned on arm (among other architectures), so if this > test started failing now on x86-64 due to -funsigned-char, it has > presumably been failing on arm the whole time. That's the curious part, indeed... > Oh, I think I see one potential problem in ext4: > > ext4_xattr_hash_entry() is hot garbage. Lookie here: > > while (name_len--) { > hash = (hash << NAME_HASH_SHIFT) ^ > (hash >> (8*sizeof(hash) - NAME_HASH_SHIFT)) ^ > *name++; > } > > so that hash will now depend on the sign of that 'char *name' pointer. > > If that hash ever has any long-term meaning (ie saved on disk or > exposed some other way), that would be problematic. Note that ext4 has lots of sign-specific code for hashing. Only some of it can now be removed, since compatibility with old file systems must be preserved. But what I mean is the code that begins in super.c: i = le32_to_cpu(es->s_flags); if (i & EXT2_FLAGS_UNSIGNED_HASH) sbi->s_hash_unsigned = 3; else if ((i & EXT2_FLAGS_SIGNED_HASH) == 0) { #ifdef __CHAR_UNSIGNED__ if (!sb_rdonly(sb)) es->s_flags |= cpu_to_le32(EXT2_FLAGS_UNSIGNED_HASH); sbi->s_hash_unsigned = 3; #else if (!sb_rdonly(sb)) es->s_flags |= cpu_to_le32(EXT2_FLAGS_SIGNED_HASH); #endif } The second part of that #else can now go away. And then maybe the whole expression can be simplified. These actually wind up being used in namei.c: hinfo->hash_version += EXT4_SB(dir->i_sb)->s_hash_unsigned; , which then sets the hash version that's selected in hash.c: switch (hinfo->hash_version) { case DX_HASH_LEGACY_UNSIGNED: hash = dx_hack_hash_unsigned(name, len); break; case DX_HASH_LEGACY: hash = dx_hack_hash_signed(name, len); break; case DX_HASH_HALF_MD4_UNSIGNED: str2hashbuf = str2hashbuf_unsigned; fallthrough; case DX_HASH_HALF_MD4: p = name; [...] And so on. dx_hack_hash_unsigned() and dx_hack_hash_signed() are the same functions, except one uses `unsigned char` and the other uses `signed char`. It's unfortunate these exist, but now it's part of the on-disk format, so they have to stick around (along with other warts like "halfmd4"). But at least for new file systems, things should be unified. Anyway, it looks like for *these* hashes, the ext4 developers did consider the signedness issue. Sounds like maybe it was left out of ext4_xattr_hash_entry(), which does indeed look like it's part of the on-disk representation: static int ext4_xattr_inode_verify_hashes(struct inode *ea_inode, struct ext4_xattr_entry *entry, void *buffer, size_t size) { u32 hash; /* Verify stored hash matches calculated hash. */ hash = ext4_xattr_inode_hash(EXT4_SB(ea_inode->i_sb), buffer, size); if (hash != ext4_xattr_inode_get_hash(ea_inode)) return -EFSCORRUPTED; if (entry) { __le32 e_hash, tmp_data; /* Verify entry hash. */ tmp_data = cpu_to_le32(hash); e_hash = ext4_xattr_hash_entry(entry->e_name, entry->e_name_len, &tmp_data, 1); if (e_hash != entry->e_hash) return -EFSCORRUPTED; } return 0; } So if ext4_xattr_hash_entry() is indeed broken with respect to heisensignness, then this stuff has always been broken, and it's probably good that this is being unearthed... Jason