On Wed, Oct 15, 2008 at 4:38 PM, Paul Mackerras <paulus@xxxxxxxxx> wrote: > Alexander Gavrilov writes: > >> Since git apparently cannot work with filenames in non-locale >> encodings anyway, I did not try to do anything about it apart >> from fixing some obvious bugs. > > What we did before was read filenames and convert them from the system > encoding (done implicitly by gets) before unquoting filenames that > were quoted. What we do now with your patch 1/2 is that we read the > filenames in binary and unquote any quoted filenames before converting > from the system encoding. So I don't think your patch would have made > as much difference as it might appear. If there is a reason for > unquoting before converting from the system encoding rather than > after, it seems pretty subtle to me and wasn't explained in the patch > description. An explanation, preferably with examples, would be > useful. The reason is that non-ASCII characters may be quoted too, so the string that we read looks like "\204\206\204y\204s\204\200\204r\204y\204~\204p.txt". There is no point decoding it before unquoting. > Also, you didn't say whether you found the "obvious bugs" by > inspection or by encountering their effects in actual running (and if > so, what those effects were). That information is also good to have > in the patch description. I actually created a test repository with non-ASCII filenames. If I remember it correctly, the bugs manifested as strings in the tree view appearing as if they were decoded using ISO-8859-1 (the result of decoding before unquoting), or unstaged files being listed quoted as above. Now the remaining encoding issues are: 1) Commit messages that are loaded through readcommit are decoded using the system encoding. It is rare, but it happens. This is a bug. proc readcommit {id} { if {[catch {set contents [exec git cat-file commit $id]}]} return parsecommit $id $contents 0 } 2) Gitk cannot process commits stored in multiple different encodings: they all are decoded using the current value of i18n.commitencoding. This seems to be low priority, because most GUI users are better off using utf-8 for their commits anyway. Alexander -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html