Re: [PATCH 1/3] xfs: stabilize the tolower function used for ascii-ci dir hash computation

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Tue, 4 Apr 2023 11:58:13 -0700

On Tue, Apr 4, 2023 at 11:32 AM Darrick J. Wong <djwong@xxxxxxxxxx> wrote:
>
> >
> > And when you compare to glibc, you only compare to "some random locale
> > that happens to be active rigth n ow". Something that the kernel
> > itself cannot and MUST NOT do.
>
> What then is the point of having tolower in the kernel at all?

It's perfectly fine for US-ASCII. So together with 'isascii()' is is just fine.

Now, if you ask me why the data itself isn't then just limited to
US-ASCII, I can only say "history and bad drugs".

The Linux tolower() goes back to Linux-0.01, and my original version
actually got this right, and left all the upper 128 characters as 0 in
the _ctype[] array.

But then at some point, we failed at life, and started filling in the
upper bit cases too.

Looking around, it was at Linux-2.0.1, back in 1996. It's way before
we had good changelogs, so I can't really say *why* we did that
change, but I do believe that bad taste was involved.

But at least it was *somewhat* reasonable to do a Latin1-based ctype
back in 1996:

  --- v2.0.0/linux/lib/ctype.c Mon Nov 27 15:53:48 1995
  +++ linux/lib/ctype.c Tue Jul  2 19:08:43 1996

I would not object to going back to the proper US-ASCII only version
today, but I fear that we might have a lot of subtle legacy use ;(

                   Linus

PS Heh, and now that I look at my original ctype.h, find the bug.
Clearly that wasn't *used*:

  #define tolower(c) (_ctmp=c,isupper(_ctmp)?_ctmp+('a'+'A'):_ctmp)
  #define toupper(c) (_ctmp=c,islower(_ctmp)?_ctmp+('A'-'a'):_ctmp)

and they weren't fixed until 0.11 - probably because nothing actually used them.