Re: git on MacOSX and files with decomposed utf-8 file names

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Jan 21, 2008, at 4:17 PM, Martin Langhoff wrote:

On Jan 22, 2008 9:53 AM, Kevin Ballard <kevin@xxxxxx> wrote:
On Jan 21, 2008, at 3:33 PM, Linus Torvalds wrote:
Umm. What's this inability to see that data is data is data?

I'm not sure what you mean. I stated a fact - at least on OS X, the
filename does not contribute to the listed filesize, so changing the
encoding of the filename doesn't change the filesize. This isn't a
philosophical point, it's a factual statement.

Kevin,

as you might know, Linus' "other hobby" is to write kernels ;-) From
taht POV, a filename is as much data as the data in the file. Doing
odd things like sorting it, searching through it, etc, is all work for
code higher in the stack that is free to mangle the data in any way it
wants, including creating nice case-insensitive indexes, and
who-knows-what for ideogram-based languages. In contrast, the core OS
treats user data a sacred stuff, and I'm thankful it does.

That's certainly a reasonable POV. However, it's not the only one. As evidenced by the Mac, treating filenames as strings rather than bytes is a viable alternative POV - you can't argue that it doesn't work, because OS X proves it does.

However, it is a trade-off.

And from a kernel/filesystem POV, a directory is also a file. So if a
filename has a different number of octets, the directory will be
different.

Sure, that makes sense. That's why, if you are going to mangle filenames, you need to pick a stable form to always use, which HFS+ does.

For all the searching and matching, it really makes sense to have
something like locate or SpotLight or whatever to index user files
that should be easy to find and match, because all the locale rules
for matching are hideously expensive to apply. Even today, most UTF-8
aware (and supposedly collation-smart) applications have trouble
matching MARTÍN when asked for martín in a case-insensitive search.
That pesky latin í trips them up everytime.


Perhaps you should try OS X. Every single Cocoa app should do the search properly. In fact, I just checked using 3 different text engines (WebKit, Cocoa's text engine, and ATSUI) and all 3 did the case-insensitive search properly. That said, this isn't particularly relevant.

-Kevin Ballard

--
Kevin Ballard
http://kevin.sb.org
kevin@xxxxxx
http://www.tildesoft.com


<<attachment: smime.p7s>>


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux