Re: [RFC PATCH] Windows: Assume all file names to be UTF-8 encoded.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Johannes Schindelin:

No. As far as Git is concerned, the file names are just as much blobs as the file contents.

I've struggled with the same problems on Linux before, since its file systems doesn't have the concept of characters, either. I guess it's just design principles, but as far as I am concerned, having file names be constructed from characters makes a lot more sense than having them constructed from bytes.

Git does the right thing in assuming commit messages and user names be UTF-8 characters, though, it would have been nice to have file names covered by the same constraints.

The fact that Windows messes with this notion just as it messes with the file contents (think the endless story whose name is CR/LF) shows only how "well" designed the concepts in Windows are.

In this case, yes, Windows' way of doing does make more sense, at least to me. And as far as text files are concerned, treating text as sequences of bytes are in most cases not a very smart thing to do, either, but it's hard not to given how most computers are constructed.

And as it stands, we have at least two issues on the msysGit issue tracker that complain that Git does not work with localized file names properly.

So no, file names are not UTF-8 at all, especially not on Windows.

I am not trying to make file names *on Windows* to be UTF-8. I am trying to make file names on Windows be Windows file names, i.e UTF-16 Unicode. It's just that since Git internally uses the char* APIs, and from what I have seen in most other cases assume that char* text is UTF-8, I am trying to convert from Windows' view of path names to Git's (UTF-16 to UTF-8) and back.

The other way would be to keep the char* APIs but convert to the Windows locale encoding ("ANSI codepage"), but that will break horribly as not all file names that can be used on a file system can be represented as such. Plus, all calls to a Windows API using a char* path name *is* converted into UTF-16 anyway, since that is what is used internally in the Windows NT subsystems.

Do not get me wrong, I really welcome you taking care of the issue, but I a
do not think that forcing UTF-8 is a solution.

Some kind of handling of Git repositories where file names are not UTF-8 would probably need to be added, yes.

--
\\// Peter - http://www.softwolves.pp.se/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux