On Fri, 2010-10-22 at 10:48 -0700, Jakub Narebski wrote: > Drew Northup <drew.northup@xxxxxxxxx> writes: > > Well I shall plumb the documentation again.... just in case. I'm not > > holding my breath that it will do what I (and frankly a fair number of > > other people) want. We just want version control that treats text like > > text. FULL STOP. Why isn't UTF-16 text??????? > > If you are asking why Git detects files with text in UTF-16 / USC-2 as > binary, it is because Git (re)uses the same heuristic that e.g. GNU > diff (and probably also -T file test in Perl), and one of heuristics > is that if file contains NUL ("\0") character, then it is most > porbably binary (because legacy C programs for text would have > troubles with NUL characters). > > That probably doesn't help you any... I did find that already. I still have not decided that correct place to shoehorn in Unicode detection, but I'll be sure to do that before I bother anybody else with it. I already wrote code to detect (reasonably) valid UTF-16 (if it isn't obviously valid then I'll just as soon deal with it as binary data, so as to avoid a foot-shooting exercise). My main motivation here has been to get some feedback as I write stuff so as to not waste a lot of time during writing something that could be done better. (As opposed to not done at all, which is the feeling I'm getting from a few people around here...) -- -Drew Northup N1XIM AKA RvnPhnx on OPN ________________________________________________ "As opposed to vegetable or mineral error?" -John Pescatore, SANS NewsBites Vol. 12 Num. 59 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html