On Thu, Mar 16, 2023 at 09:48:26AM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@xxxxxxxxxx> > > Back in the 6.2-rc1 days, Eric Whitney reported a fstests regression in > ext4 against generic/454. The cause of this test failure was the > unfortunate combination of setting an xattr name containing UTF8 encoded > emoji, an xattr hash function that accepted a char pointer with no > explicit signedness, signed type extension of those chars to an int, and > the 6.2 build tools maintainers deciding to mandate -funsigned-char > across the board. As a result, the ondisk extended attribute structure > written out by 6.1 and 6.2 were not the same. > > This discrepancy, in fact, had been noticeable if a filesystem with such > an xattr were moved between any two architectures that don't employ the > same signedness of a raw "char" declaration. The only reason anyone > noticed is that x86 gcc defaults to signed, and no such -funsigned-char > update was made to e2fsprogs, so e2fsck immediately started reporting > data corruption. > > After a day and a half of discussing how to handle this use case (xattrs > with bit 7 set anywhere in the name) without breaking existing users, > Linus merged his own patch and didn't tell the mailing list. None of > the developers noticed until AUTOSEL made an announcement. > > In the end, this problem could have been detected much earlier if there > had been any useful tests of hash function(s) in use inside ext4 to make > sure that they always produce the same outputs given the same inputs. > > The XFS dirent/xattr name hash takes a uint8_t*, so I don't think it's > vulnerable to this problem. However, let's avoid all this drama by > adding our own self test to check that the da hash produces the same > outputs for a static pile of inputs on various platforms. This will be > followed up in xfsprogs with a similar patch. > > Link: https://lore.kernel.org/linux-ext4/Y8bpkm3jA3bDm3eL@debian-BULLSEYE-live-builder-AMD64/ > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> I'm going to trust that your binary tables exercise the hash in the manner needed because I don't have time right now to manually decode it. With that caveat, everything else looks fine. Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> -- Dave Chinner david@xxxxxxxxxxxxx