On Thu, 17 Jan 2008, Pedro Melo wrote: > > We have people using windows, people using Macs, and people using several > flavors of Linux desktops. They all have different settings and if I add a > file like áéióú that happens to be UTF-8 encoded, it will reach a iso-latin-1 > user as visual garbage. Yes. > git will track the file perfectly, we know that, because the sequence of > bytes that my system used to create the file will be the same on all > "sane" systems, but the file will look "funny" to some users, and we get > complaints for some less enlightened ones. I can't really suggest anything else than trying to make everybody use UTF-8. [ Not just for filenames, by the way - this is one of the reasons I think it is so *important* to not corrupt filenames, exactly because this is in no way filename-specific at all, and filenames are generally "textual data" exactly the same way a text-file is. But only totally insane people think that you should force-normalize text-files, even though all the issues are obviously all the same regardless of whether it's a filename or a word in textfile. ] And yes, I also realize that it's not going to be realistic. We're probably *closer* to that than we used to be, but I don't think you can even make Windows think FAT is UTF-8. I don't know how NTFS works (I know it is Unicode-aware, and I think it encodes filenames in UCS-2 or possibly UTF-16, but there is an obvious 1:1 translation to UTF-8, and since we use C strings, I'd assume/hope Windows actually uses that unambiguous translation for any filenames). Under modern Linux and OS X, UTF-8 is basically the only way (older Linux distros may be set up for Latin1, but at least the newer ones seem to all default to a UTF-8 locale). > The answer is that users should not create filenames with non-ascii characters > if they want a consistent experience, right? Oh, absolutely. That takes care of 99.9% of all source projects. Even then you can have problems with case insensitivity (the Linux kernel sources are all US-ASCII filenames, for example, but *literally* has many files that are identical if you ignore case, and that's not unheard of). So yes, to a first approximation, the answer is to simply avoid using anything but US-ASCII. It's seldom a big limitation when talking about filenames. Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html