Re: Detecting default signedness of char in ext4 (despite -funsigned-char)

"Theodore Ts'o" <tytso@xxxxxxx> · Wed, 18 Jan 2023 16:49:25 -0500

On Wed, Jan 18, 2023 at 03:14:04PM -0600, Linus Torvalds wrote:
> You're missing the fact that 'char' gets expanded to 'int', and in the
> process but #7 gets copied to bits 8-31 if it is signed.
> 
> Then the xor and the later shifting will move those bits around..

Doh!  One of those C pitfalls that I don't know how I mad missed.

I agree with your analysis that in actual practice, almost no one
actually uses non-ASCII characters for xattr names.  (Filenames, yes,
but in general xattr names are set by programs, not by users.)  So
besides xfstests generic/454, how likely is it that people would be
using things like Octopus emoji's or Unicode characters such as <GREEK
UPSILON WITH ACUTE AND HOOK SYMBOL>?  Very unlikely, I'd argue.  I
might be a bit more worried about userspace applications written for,
say, Red Flag Linux in China using chinese characters in xattrs, but
I'd argue even there it's much more likely that this would be in the
xattr values as opposed to the name.

In terms of what should we do for next steps, if we only pick signed,
then it's possible if there are some edge case users who actually did
use non-ASCII characters in the xattr name on PowerPC, ARM, or S/390,
they would be broken.  That's simpler, and if we think there are
darned few of them, I guess we could do that.

That being said, it's not that much more work to use a flag in the
superblock to indicate whether or not we should be explicitly casting
*name to either a signed or unsigned char, and then setting that flag
automagically to avoid problems on people who started the file system
on say, x86 before the signed to unsigned char transition, and who
started natively on a PowerPC, ARM, or S/390.

The one bit which makes this a bit more complex is either way, we need
to change both the kernel and e2fsprogs, which is why if we do the
signed/unsigned xattr hash flag, it's important to set the flag value
to be whatever the "default" signeded would be on that architecture
pre 6.2-rc1.

						- Ted