On Thu, 17 Jan 2008, Wincent Colaiuta wrote: > > (the day I have two files in the same directory called "Märchen" and > want to specify one of them on the command line I'll worry about that > when I come to it). Side note: the thing is, the reason people shouldn't worry about it is that this is a *trivial* thing to handle. You really don't even need to know what you're doing. And you can test it today, easily. Having two (differently encoded) files like that is really no different from the traditional UNIX FAQ of "how do I remove a file starting with '-'" or even more closely "how do I remove a file that has a character in it that I cannot get at the keyboard". In other words, on a bog-standard UNIX (and yes, in this case, I bet OS X works fine too for this test), just try this filename1=$(echo -e "hello\002there") filename2=$(echo -e "hello\003there") echo Odd file > "$filename1" echo Another odd file > "$filename2" and now you have a filename that is actually rather hard to type on the command line. In fact, for me they even *look* the same: [torvalds@woody ~]$ ll hello* -rw-rw-r-- 1 torvalds torvalds 9 2008-01-17 08:23 hello?there -rw-rw-r-- 1 torvalds torvalds 17 2008-01-17 08:23 hello?there See? Even in my graphical browser, those two filenames look 100% *identical*. I could give you a screen-shot, but I'm lazy. Just take my word for it, or just fire up konqueror on Linux (but it may well depend on the particular font you're using). [ And yes, for other browsers, you might have something that shows them as different characters - depending on the font, it might show up as a small box with [00 02] vs [00 03] in it, for example. But that's also actually 100% true of the two different encodings of 'ä' - you could easily have a file broswer that shows the multi-character as a multi-character, exactly to distinguish them and show that one of them isn't "normalized"! The point is, once the filesystem doesn't corrupt the data, it's always easy to get at, and there is never any ambiguity. ] How is this different from "Märchen" spelled with two different encodings for that "ä"? I'll tell you: it's not at all different. It's 100% the exact same issue. And does that make you perhaps go "Hunh? How do I remove it, or open it?" And the fact is, those "idential looking" filenames (and thus they must be the same, and something should have normalized them to the same thing, no?) are obviously two different files, and they are *really*easy* to edit and look at. Fire up that graphical browser again, and it doesn't even matter whether the filename looks identical or not, it shows up as two different files, and you can drag them around independently, rename them there, and at least my file browser shows clearly which is which, because I get a small icon with a preview in it, so I directly see which one is the "Odd file" and which one is the "Another odd file". So the whole "but they _look_ the same" argument is just total BS. In just about all character encodings there has always been unique and different "characters" that _look_ the same on screen, and it has never really made them actually *be* the same, and it has never been a valid argument for them being considered the same. Because even when they *look* the same, that file browser that didn't show the difference in names visually, still showed them correctly as two separate files, and I could still just rename them by hand by right-clicking on them and picking "rename". So "look the same" is really not a new thing, nor is it even a really hard thing. Yes, people can get confused by it, but hey, people can get confused by *anything*. People get confused by filenames starting with a "-", yet nobody sane really says that filenames cannot start with a dash. Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html