Re: [RFC] Print diffs of UTF-16 to console / patches to email as UTF-8...?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 22 Oct 2010, Drew Northup wrote:
> On Fri, 2010-10-22 at 10:48 -0700, Jakub Narebski wrote:
> > Drew Northup <drew.northup@xxxxxxxxx> writes:
> 
> > > Well I shall plumb the documentation again.... just in case. I'm not
> > > holding my breath that it will do what I (and frankly a fair number of
> > > other people) want. We just want version control that treats text like
> > > text. FULL STOP. Why isn't UTF-16 text???????
> > 
> > If you are asking why Git detects files with text in UTF-16 / USC-2 as
> > binary, it is because Git (re)uses the same heuristic that e.g. GNU
> > diff (and probably also -T file test in Perl), and one of heuristics
> > is that if file contains NUL ("\0") character, then it is most
> > porbably binary (because legacy C programs for text would have
> > troubles with NUL characters).
> > 
> > That probably doesn't help you any...
> 
> I did find that already. I still have not decided that correct place to
> shoehorn in Unicode detection, but I'll be sure to do that before I
> bother anybody else with it. I already wrote code to detect (reasonably)
> valid UTF-16 (if it isn't obviously valid then I'll just as soon deal
> with it as binary data, so as to avoid a foot-shooting exercise).
> My main motivation here has been to get some feedback as I write stuff
> so as to not waste a lot of time during writing something that could be
> done better. 
>
> (As opposed to not done at all, which is the feeling I'm getting from a
> few people around here...)

Git supports well different encoding used in commit message (which is
always text, as opposed to file contents which might be binary or text).

You specify what encoding you use to format commit messages with
i18n.commitEncoding (defaults to 'utf-8'); if it is different than utf-8
it gets saved in 'encoding' header.  You can even specify that encoding
that your terminal uses is different from i18n.commitEncoding with
i18n.logOutputEncoding

The only support for different encoding of file contents is used by
git-gui.  You provide encoding that a file uses via .gitattributes
(the `encoding` attribute).  You specify what output encoding git-gui
(Tcl/Tk) uses with `gui.encoding` config variable.

I guess that what you need to support for diffs and 'git show <file>'
etc. is respecting `encoding` .gitattribute, and providing encoding
that console uses with e.g. i18n.blobOutputEncoding (or something like
that).

HTH
-- 
Jakub Narebski
Poland
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]