On Tue, Apr 4, 2023 at 11:32 AM Darrick J. Wong <djwong@xxxxxxxxxx> wrote: > > > > > And when you compare to glibc, you only compare to "some random locale > > that happens to be active rigth n ow". Something that the kernel > > itself cannot and MUST NOT do. > > What then is the point of having tolower in the kernel at all? It's perfectly fine for US-ASCII. So together with 'isascii()' is is just fine. Now, if you ask me why the data itself isn't then just limited to US-ASCII, I can only say "history and bad drugs". The Linux tolower() goes back to Linux-0.01, and my original version actually got this right, and left all the upper 128 characters as 0 in the _ctype[] array. But then at some point, we failed at life, and started filling in the upper bit cases too. Looking around, it was at Linux-2.0.1, back in 1996. It's way before we had good changelogs, so I can't really say *why* we did that change, but I do believe that bad taste was involved. But at least it was *somewhat* reasonable to do a Latin1-based ctype back in 1996: --- v2.0.0/linux/lib/ctype.c Mon Nov 27 15:53:48 1995 +++ linux/lib/ctype.c Tue Jul 2 19:08:43 1996 I would not object to going back to the proper US-ASCII only version today, but I fear that we might have a lot of subtle legacy use ;( Linus PS Heh, and now that I look at my original ctype.h, find the bug. Clearly that wasn't *used*: #define tolower(c) (_ctmp=c,isupper(_ctmp)?_ctmp+('a'+'A'):_ctmp) #define toupper(c) (_ctmp=c,islower(_ctmp)?_ctmp+('A'-'a'):_ctmp) and they weren't fixed until 0.11 - probably because nothing actually used them.