Re: Cross-Platform Version Control

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Wed, 13 May 2009, Daniel Barkalow wrote:
> > 
> > Now, the simple OS X case is not a huge problem, since the lstat will 
> > succeed with the fixed-up filename too.
> 
> I'm not seeing what the general case is, and how it could possibly behave.

Here's a simple example.

Let's say that your company uses Latin1 internally for your filesystems, 
because your tools really aren't utf-8 ready. 

This is NOT AT ALL unnatural - it's how lots of people used to work with 
Linux over the years, and it's largely how people still use FAT, I suspect 
(except it's not latin1, it's some windows-specific 8-bits-per-character 
mapping).


IOW, if you have a file called 'åäö', it literally is encoded as 
'\xe5\xe4\xf6' (if you wonder why I picked those three letters, it's 
because they are the regular extra letters in Swedish - Swedish has 29 
letters in its alphabet, and those three letters really are letters in 
their own right, they are NOT 'a' and 'o' with some dots/rings on top).

IOW, if you open such a file, you need to use those three bytes.

Now, even if you happen to have an OS and use Latin1 on disk, you may 
realize that you'd like to interact with others that use UTF-8, and would 
want to have your git archive that you export use nice portable UTF-8.

But you absolutely MUST NOT just do a conversion at "readdir()" time. If 
you do that, then your three-byte filename turns into a six-byte utf-8 
sequence of '\xc3\xa5\xc3\xa4\xc3\xb6' and the thing is, now "lstat()" 
won't work on that sequence.

So obviously you could always turn things _back_ for lstat(), but quite 
frankly, that's (a) insane (b) incompetent and (c) not even always 
well-defined.

> There's the "insensitive" behavior: if you create "foo" and look for 
> "FOO", it's there, but readdir() reports "foo".
> 
> There's the "converting" behavior: if you create "foo", readdir() reports 
> "FOO", but lstat("foo") returns it.

Then there's the behaviour above: you want your git repository to have 
utf-8, but your filesystem doesn't convert anything at all, and all your 
regular tools (think editors etc) are all Latin1.

Latin1 is going away, I hope, but I bet EUC-JP etc still exist. 

		Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]