On Wed, 13 May 2009, Daniel Barkalow wrote: > > > > Now, the simple OS X case is not a huge problem, since the lstat will > > succeed with the fixed-up filename too. > > I'm not seeing what the general case is, and how it could possibly behave. Here's a simple example. Let's say that your company uses Latin1 internally for your filesystems, because your tools really aren't utf-8 ready. This is NOT AT ALL unnatural - it's how lots of people used to work with Linux over the years, and it's largely how people still use FAT, I suspect (except it's not latin1, it's some windows-specific 8-bits-per-character mapping). IOW, if you have a file called 'åäö', it literally is encoded as '\xe5\xe4\xf6' (if you wonder why I picked those three letters, it's because they are the regular extra letters in Swedish - Swedish has 29 letters in its alphabet, and those three letters really are letters in their own right, they are NOT 'a' and 'o' with some dots/rings on top). IOW, if you open such a file, you need to use those three bytes. Now, even if you happen to have an OS and use Latin1 on disk, you may realize that you'd like to interact with others that use UTF-8, and would want to have your git archive that you export use nice portable UTF-8. But you absolutely MUST NOT just do a conversion at "readdir()" time. If you do that, then your three-byte filename turns into a six-byte utf-8 sequence of '\xc3\xa5\xc3\xa4\xc3\xb6' and the thing is, now "lstat()" won't work on that sequence. So obviously you could always turn things _back_ for lstat(), but quite frankly, that's (a) insane (b) incompetent and (c) not even always well-defined. > There's the "insensitive" behavior: if you create "foo" and look for > "FOO", it's there, but readdir() reports "foo". > > There's the "converting" behavior: if you create "foo", readdir() reports > "FOO", but lstat("foo") returns it. Then there's the behaviour above: you want your git repository to have utf-8, but your filesystem doesn't convert anything at all, and all your regular tools (think editors etc) are all Latin1. Latin1 is going away, I hope, but I bet EUC-JP etc still exist. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html