On Tue, Apr 04, 2023 at 11:32:14AM -0700, Darrick J. Wong wrote: > On Tue, Apr 04, 2023 at 10:54:27AM -0700, Linus Torvalds wrote: > > On Tue, Apr 4, 2023 at 10:07 AM Darrick J. Wong <djwong@xxxxxxxxxx> wrote: > > > > > > + if (c >= 0xc0 && c <= 0xd6) /* latin A-O with accents */ > > > + return true; > > > + if (c >= 0xd8 && c <= 0xde) /* latin O-Y with accents */ > > > + return true; > > > > Please don't do this. > > > > We're not in the dark ages any more. We don't do crazy locale-specific > > crud. There is no such thing as "latin1" any more in any valid model. > > > > For example, it is true that 0xC4 is 'Ä' in Latin1, and that the > > lower-case version is 'ä', and you can do the lower-casing exactly the > > same way as you do for US-ASCII: you just set bit 5 (or "add 32" or > > "subtract 0xE0" - the latter is what you seem to do, crazy as it is). > > > > So the above was fine back in the 80s, and questionably correct in the > > 90s, but it is COMPLETE GARBAGE to do this in the year 2023. > > Yeah, I get that. Fifteen years ago, Barry Naujok and Christoph merged > this weird ascii-ci feature for XFS that purportedly does ... something. > It clearly only works properly if you force userspace to use latin1, > which is totally nuts in 2023 given that the distros default to UTF8 > and likely don't test anything else. It probably wasn't even a good > idea in *2008*, but it went in anyway. Nobody tested this feature, > metadump breaks with this feature enabled, but as maintainer I get to > maintain these poorly designed half baked projects. It was written specifically for a NFS/CIFS fileserver appliance product and Samba needed filesystem-side CI to be able to perform even vaguely well on industry-standard fileserver benchmarketing workloads that were all the rage at the time. Because of the inherent problems with CI and UTF-8 encoding, etc, it was decided that Samba would be configured to export latin1 encodings as that covered >90% of the target markets for the product. Hence the "ascii-ci" casefolding code could be encoded into the XFS directory operations and remove all the overhead of casefolding from Samba. In various "important" directory benchmarketing workloads, ascii-ci resulted in speedups of 100-1000x. These were competitive results comapred to the netapp/bluearc/etc appliances of the time in the same cost bracket. Essentially, it was a special case solution to meet a specific product requirement and was never intended to be used outside that specific controlled environment. Realistically, this is the one major downside of "upstream first" development principle. i.e. when the vendor product that required a specific feature is long gone, upstream still has to support that functionality even though there may be no users of it remaining and/or no good reason for it continuing to exist. I'd suggest that after this is fixed we deprecate ascii-ci and make it go away at the same time v4 filesystems go away. It was, after all, a feature written for v4 filesystems.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx