On Wed. 23 Jan 2008, Steffen Prohaska wrote: > On Jan 23, 2008, at 6:55 AM, Junio C Hamano wrote: >> "Shawn O. Pearce" <spearce@xxxxxxxxxxx> writes: >> >>> git-gui: Use gitattribute "encoding" for file content display >>> >>> Most folks using git-gui on internationalized files have complained >>> that it doesn't recognize UTF-8 correctly. In the past we have just >>> ignored the problem and showed the file contents as binary/US-ASCII, >>> which is wrong no matter how you look at it. >> >> Hmmm. >> >> At least for now in 1.5.4, I'd prefer the way gitk shows UTF-8 >> (if I recall correctly latin-1 or other legacy encoding, as long >> as LANG/LC_* is given appropriately, as well) contents without >> per-path configuration without introducing new attributes. > > Shouldn't we first try harder to get things right without adding > an attribute? Maybe we could continue a good tradition and look > at the content of the first: we could first look for hints in the > file about the encoding. XML and many text files contain such > hints already to help editors. For example, Python source can > explicitly contain the encoding [1]; and I guess there are many > other examples. For example LaTeX files either use inputenc package to set encoding (e.g. \usepackage[latin2]{inputenc}) or use magic first line to specify TCX (TeX character translation) file (e.g. %& -translate-file=il2-t1). Emacs encourages to use file variables, either in the form of magic first line, or file variables at the end of file; I think the same is true for Vim. I'd like then for it to be at least as configurable as diff.*.funcname is for diff. > If we don't find a direct hint, we could have > some magic auto-detection similar to what we do for autocrlf. We can at least try to and check for UTF-16 magic first two bytes, and detect if we have character which is invalid in UTF-8 (for performance I guess checking only beginning of file)... > As a fallback the user could specify a default encoding. But only > as a last resort, I'd use explicit attributes. ...and then falling back to fallback encoding, like gitweb does. -- Jakub Narebski Poland - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html