On Fri, 22 Oct 2010, Drew Northup wrote: > On Fri, 2010-10-22 at 10:48 -0700, Jakub Narebski wrote: > > Drew Northup <drew.northup@xxxxxxxxx> writes: > > > > Well I shall plumb the documentation again.... just in case. I'm not > > > holding my breath that it will do what I (and frankly a fair number of > > > other people) want. We just want version control that treats text like > > > text. FULL STOP. Why isn't UTF-16 text??????? > > > > If you are asking why Git detects files with text in UTF-16 / USC-2 as > > binary, it is because Git (re)uses the same heuristic that e.g. GNU > > diff (and probably also -T file test in Perl), and one of heuristics > > is that if file contains NUL ("\0") character, then it is most > > porbably binary (because legacy C programs for text would have > > troubles with NUL characters). > > > > That probably doesn't help you any... > > I did find that already. I still have not decided that correct place to > shoehorn in Unicode detection, but I'll be sure to do that before I > bother anybody else with it. I already wrote code to detect (reasonably) > valid UTF-16 (if it isn't obviously valid then I'll just as soon deal > with it as binary data, so as to avoid a foot-shooting exercise). > My main motivation here has been to get some feedback as I write stuff > so as to not waste a lot of time during writing something that could be > done better. > > (As opposed to not done at all, which is the feeling I'm getting from a > few people around here...) Git supports well different encoding used in commit message (which is always text, as opposed to file contents which might be binary or text). You specify what encoding you use to format commit messages with i18n.commitEncoding (defaults to 'utf-8'); if it is different than utf-8 it gets saved in 'encoding' header. You can even specify that encoding that your terminal uses is different from i18n.commitEncoding with i18n.logOutputEncoding The only support for different encoding of file contents is used by git-gui. You provide encoding that a file uses via .gitattributes (the `encoding` attribute). You specify what output encoding git-gui (Tcl/Tk) uses with `gui.encoding` config variable. I guess that what you need to support for diffs and 'git show <file>' etc. is respecting `encoding` .gitattribute, and providing encoding that console uses with e.g. i18n.blobOutputEncoding (or something like that). HTH -- Jakub Narebski Poland -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html