Re: [f2fs-dev] [PATCH v6 0/9] Support negative dentries on case-insensitive ext4 and f2fs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 20, 2023 at 07:03:13PM -0800, Linus Torvalds wrote:
> On Mon, 20 Nov 2023 at 18:29, Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > It's a bit complicated, yes. But no, doing things one unicode
> > character at a time is just bad bad bad.
> 
> Put another way: the _point_ of UTF-8 is that ASCII is still ASCII.
> It's literally why UTF-8 doesn't suck.
> 
> So you can still compare ASCII strings as-is.
> 
> No, that doesn't help people who are really using other locales, and
> are actively using complicated characters.
> 
> But it very much does mean that you can compare "Bad" and "bad" and
> never ever look at any unicode translation ever.

Yeah, agreed, that would be a nice optimization.  However, in the
unfortunate case where (a) it's non-ASCII, and (b) the input string is
non-normalized and/or differs in case, we end up scanning some portion
of the two strings twice; once doing the strcmp, and once doing the
Unicode slow path.

That being said, given that even in the case where we're dealing with
non-ASCII strings, in the fairly common case where the program is
doing a readdir() followed by a open() or stat(), the filename will be
byte-identical and so a strcmp() will suffice.

So I agree that it's a nice optimization.  It'd be interesting how
much such an optimization would actually show up in various
benchmarks.  It'd have to be something that was really metadata-heavy,
or else the filenamea lookups would get drowned out.

   	    	      	      	    	- Ted




[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux