Re: git on MacOSX and files with decomposed utf-8 file names

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 17 Jan 2008, Wincent Colaiuta wrote:
>
> (the day I have two files in the same directory called "Märchen" and 
> want to specify one of them on the command line I'll worry about that 
> when I come to it).

Side note: the thing is, the reason people shouldn't worry about it is 
that this is a *trivial* thing to handle. You really don't even need to 
know what you're doing. And you can test it today, easily.

Having two (differently encoded) files like that is really no different 
from the traditional UNIX FAQ of "how do I remove a file starting with 
'-'" or even more closely "how do I remove a file that has a character in 
it that I cannot get at the keyboard".

In other words, on a bog-standard UNIX (and yes, in this case, I bet OS X 
works fine too for this test), just try this

	filename1=$(echo -e "hello\002there")
	filename2=$(echo -e "hello\003there")
	echo Odd file > "$filename1"
	echo Another odd file > "$filename2"

and now you have a filename that is actually rather hard to type on the 
command line. In fact, for me they even *look* the same:

	[torvalds@woody ~]$ ll hello*
	-rw-rw-r-- 1 torvalds torvalds  9 2008-01-17 08:23 hello?there
	-rw-rw-r-- 1 torvalds torvalds 17 2008-01-17 08:23 hello?there

See?

Even in my graphical browser, those two filenames look 100% *identical*. I 
could give you a screen-shot, but I'm lazy. Just take my word for it, or 
just fire up konqueror on Linux (but it may well depend on the particular 
font you're using).

[ And yes, for other browsers, you might have something that shows them as 
  different characters - depending on the font, it might show up as a 
  small box with [00 02] vs [00 03] in it, for example. But that's also 
  actually 100% true of the two different encodings of 'ä' - you could 
  easily have a file broswer that shows the multi-character as a 
  multi-character, exactly to distinguish them and show that one of them 
  isn't "normalized"!

  The point is, once the filesystem doesn't corrupt the data, it's always 
  easy to get at, and there is never any ambiguity. ]

How is this different from "Märchen" spelled with two different encodings 
for that "ä"?

I'll tell you: it's not at all different. It's 100% the exact same issue.

And does that make you perhaps go "Hunh? How do I remove it, or open it?"

And the fact is, those "idential looking" filenames (and thus they must be 
the same, and something should have normalized them to the same thing, 
no?) are obviously two different files, and they are *really*easy* to edit 
and look at.

Fire up that graphical browser again, and it doesn't even matter whether 
the filename looks identical or not, it shows up as two different files, 
and you can drag them around independently, rename them there, and at 
least my file browser shows clearly which is which, because I get a small 
icon with a preview in it, so I directly see which one is the "Odd file" 
and which one is the "Another odd file".

So the whole "but they _look_ the same" argument is just total BS. In just 
about all character encodings there has always been unique and different 
"characters" that _look_ the same on screen, and it has never really made 
them actually *be* the same, and it has never been a valid argument for 
them being considered the same.

Because even when they *look* the same, that file browser that didn't show 
the difference in names visually, still showed them correctly as two 
separate files, and I could still just rename them by hand by 
right-clicking on them and picking "rename". 

So "look the same" is really not a new thing, nor is it even a really hard 
thing. Yes, people can get confused by it, but hey, people can get 
confused by *anything*. People get confused by filenames starting with a 
"-", yet nobody sane really says that filenames cannot start with a dash.

			Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux