Re: [RFC] Print diffs of UTF-16 to console / patches to email as UTF-8...?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2010-10-22 at 10:48 -0700, Jakub Narebski wrote:
> Drew Northup <drew.northup@xxxxxxxxx> writes:

> > Well I shall plumb the documentation again.... just in case. I'm not
> > holding my breath that it will do what I (and frankly a fair number of
> > other people) want. We just want version control that treats text like
> > text. FULL STOP. Why isn't UTF-16 text???????
> 
> If you are asking why Git detects files with text in UTF-16 / USC-2 as
> binary, it is because Git (re)uses the same heuristic that e.g. GNU
> diff (and probably also -T file test in Perl), and one of heuristics
> is that if file contains NUL ("\0") character, then it is most
> porbably binary (because legacy C programs for text would have
> troubles with NUL characters).
> 
> That probably doesn't help you any...

I did find that already. I still have not decided that correct place to
shoehorn in Unicode detection, but I'll be sure to do that before I
bother anybody else with it. I already wrote code to detect (reasonably)
valid UTF-16 (if it isn't obviously valid then I'll just as soon deal
with it as binary data, so as to avoid a foot-shooting exercise).
My main motivation here has been to get some feedback as I write stuff
so as to not waste a lot of time during writing something that could be
done better. 
(As opposed to not done at all, which is the feeling I'm getting from a
few people around here...)
-- 
-Drew Northup N1XIM
   AKA RvnPhnx on OPN
________________________________________________
"As opposed to vegetable or mineral error?"
-John Pescatore, SANS NewsBites Vol. 12 Num. 59

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]