[now adding hch because his name is on the original patches from 2008] On Tue, Apr 04, 2023 at 11:06:27AM -0700, Linus Torvalds wrote: > On Tue, Apr 4, 2023 at 10:07 AM Darrick J. Wong <djwong@xxxxxxxxxx> wrote: > > > \> Now that we've made kernel and userspace use the same tolower code for > > computing directory index hashes, add that to the selftest code. > > Please just delete this test. It's really really fundamentally wrong. > > The fact that you even *think* that you use the same tolower() as user > space does shows just that you don't even understand how user space > works. Wrong. I'm well aware that userspace tolower and kernel tolower are *NOT* the same thing. I'm trying to **STOP USING** tolower in XFS. I'm **replacing** tolower with a new function that substitutes specific bytes with other bytes, and I'm redefining the ondisk format to say that names remain arbitrary sequences of bytes that do not include nulls or slashes but that hash indexing and lookups will apply the above byte transformation. The rules for that transformation may or may not coincide with what anyone thinks is "upper case in ASCII", but that's irrelevant. The rules will be the same in all places, and they will not break any existing filesystems. Maybe I should've named it xfs_ItB8n_ci_o43jM28() to make it clearer that I don't care what ascii is, nor does it matter here. Perhaps I should have written "Now that we've made kernel and userspace perform the same mathematical transformation on dirent name byte arrays before hashing, add that to the selftest code as well." Christoph and Barry Naujok should have defined specifically the exact transformation and the permitted ranges of name inputs when they wrote the feature. I wasn't around when this feature was invented... > Really. The only thing this series shows is that you do not understand > the complexities. ...and I don't think they understood the complexities when the code was written. > Lookie here: compile and run this program: > > #include <stdio.h> > #include <ctype.h> > #include <locale.h> > > int main(int argc, char **argv) > { > printf("tolower(0xc4)=%#x\n", tolower(0xc4)); > setlocale(LC_ALL, "C"); > printf("tolower(0xc4)=%#x\n", tolower(0xc4)); > setlocale(LC_ALL, "sv_SE.iso88591"); > printf("tolower(0xc4)=%#x\n", tolower(0xc4)); > } > > and on my machine, I get this: > > tolower(0xc4)=0xc4 > tolower(0xc4)=0xc4 > tolower(0xc4)=0xe4 > > and the important thing to note is that "on my machine". The first > line could be *different* on some other machine (and the last line > could be too: there's no guarantee that the sv_SE locale even exists). > > So this whole "kernel and userspace use the same tolower code" > sentence is simply completely and utterly wrong. It's not even "wrong" > in the sense fo "that's not true". It's "wrong" in the sense "that > shows that you didn't understand the problem at all". > > Put another way: saying "5+8=10" is wrong. But saying "5+8=tiger" is > nonsensical. > > Your patches are nonsensical. I disagree. I'm saying that 5 💩 8 = tiger because that's what the 💩 operator does. 💩 is not +, even if 4 💩 8 = 12. You claim to understand the complexities; how would /you/ fix this? I'll send along the test cases that reproduce the problems. --D > Linus