Johannes Schindelin:
No. As far as Git is concerned, the file names are just as much blobs as
the file contents.
I've struggled with the same problems on Linux before, since its file
systems doesn't have the concept of characters, either. I guess it's just
design principles, but as far as I am concerned, having file names be
constructed from characters makes a lot more sense than having them
constructed from bytes.
Git does the right thing in assuming commit messages and user names be UTF-8
characters, though, it would have been nice to have file names covered by
the same constraints.
The fact that Windows messes with this notion just as it messes with the
file contents (think the endless story whose name is CR/LF) shows only how
"well" designed the concepts in Windows are.
In this case, yes, Windows' way of doing does make more sense, at least to
me. And as far as text files are concerned, treating text as sequences of
bytes are in most cases not a very smart thing to do, either, but it's hard
not to given how most computers are constructed.
And as it stands, we have at least two issues on the msysGit issue tracker
that complain that Git does not work with localized file names properly.
So no, file names are not UTF-8 at all, especially not on Windows.
I am not trying to make file names *on Windows* to be UTF-8. I am trying to
make file names on Windows be Windows file names, i.e UTF-16 Unicode. It's
just that since Git internally uses the char* APIs, and from what I have
seen in most other cases assume that char* text is UTF-8, I am trying to
convert from Windows' view of path names to Git's (UTF-16 to UTF-8) and back.
The other way would be to keep the char* APIs but convert to the Windows
locale encoding ("ANSI codepage"), but that will break horribly as not all
file names that can be used on a file system can be represented as such.
Plus, all calls to a Windows API using a char* path name *is* converted into
UTF-16 anyway, since that is what is used internally in the Windows NT
subsystems.
Do not get me wrong, I really welcome you taking care of the issue, but I a
do not think that forcing UTF-8 is a solution.
Some kind of handling of Git repositories where file names are not UTF-8
would probably need to be added, yes.
--
\\// Peter - http://www.softwolves.pp.se/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html