Re: [PATCH 1/3] xfs: stabilize the tolower function used for ascii-ci dir hash computation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Apr 04, 2023 at 11:12:56PM -0700, Christoph Hellwig wrote:
> On Tue, Apr 04, 2023 at 11:32:14AM -0700, Darrick J. Wong wrote:
> > Yeah, I get that.  Fifteen years ago, Barry Naujok and Christoph merged
> > this weird ascii-ci feature for XFS that purportedly does ... something.
> > It clearly only works properly if you force userspace to use latin1,
> > which is totally nuts in 2023 given that the distros default to UTF8
> > and likely don't test anything else.  It probably wasn't even a good
> > idea in *2008*, but it went in anyway.  Nobody tested this feature,
> > metadump breaks with this feature enabled, but as maintainer I get to
> > maintain these poorly designed half baked projects.
> 
> IIRC the idea was that it should do 7-bit ASCII only, so even accepting
> Latin 1 characters seems like a bug compared to what it was documented
> to do.
> 
> > I wouldn't ever enable this feature on any computer I use, and I think
> > the unicode case-insensitive stuff that's been put in to ext4 and f2fs
> > lately are not a tarpit that I ever want to visit in XFS.  Directory
> > names should be sequences of bytes that don't include nulls or slashes,
> > end of story.
> 
> That works fine if all you care is classic Linux / Unix users.  And
> while I'd prefer if all the world was like that, the utf8 based CI
> has real use cases.  Initially mostly for Samba file serving, but
> apparently Wine also really benefits from it, so some people have CI
> directories for that.  XFS ignoring this means we are missing out on
> those usrers.

<shrug> Welllll... if someone presents a strong case for adopting the
utf8 casefolding feature that f2fs and ext4 added some ways back, I
could be persuaded to import that, bugs and all, into XFS.  However,
given all the weird problems I've uncovered with "ascii"-ci, I'm going
to be very hardnosed about adding test cases and making sure /all/ the
tooling works properly.

I wasn't thrilled at all the "Handle invalid UTF8 sequence as either an
error or as an opaque byte sequence." that went into the ext4 code.
While I concede that it's the least-legacy-code-regressive solution to
people demanding to create non-utf8 filenames on a "utf8-casefold"
filesystem, it's just ... compromised.

Really it's "utf8 casefolded lookups if all the names you create are
valid utf8 byte sequences, and if you fail at that then we fall back to
memcmp(); also there's a strict-utf8 creat mode but you can't enable it".

Gross.

> The irony is all the utf8 infrastruture was developed for XFS use
> by SGI, never made it upstream back then and got picked up for ext4.
> And while it is objectively horrible, plugging into this actually
> working infrastructure would be the right thing for XFS instead
> of the whacky ASCII only mode only done as a stepping stone while
> the utf8 infrastructure got finished.

fsdevel, the gift that keeps on giving...

--D



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux