[RFC] Print diffs of UTF-16 to console / patches to email as UTF-8...?

Drew Northup <drew.northup@xxxxxxxxx> · Fri, 22 Oct 2010 12:06:48 -0400

I am currently thinking about what the best way to preset readable (and
safely email-able) patches to the user may be when the content is
UTF-16. This is part of my ongoing work to treat UTF-16 as text (in
other words, the crlf options will work and .gitattributes hacks won't
be required to display diffs, etc).
I was also concerned that the result be re-importable to valid UTF-16 in
the end. This has led me to consider printing diffs as UTF-8 (no data
loss, at least 16->8) when the source text is UTF-16. This should also
be git-gui / gitk friendly (in theory). I would favorably consider this
as a configurable option (export_unicode_diff_as_utf8 ?) leaving plain
UTF-16 output as the standard output from "git diff" (once I convince it
that UTF-16 is indeed text).
Also, there is the issue of being able to recognize UTF-16 as UTF-16 in
diffs/patches. Is there a precedent/standard I should be aware of with
respect to BOMs and patches? I would think that adhering to the UTF-16
standard with respect to whole text files would make sense here (no BOM
== Big Endian, BOM used to match LE/BE otherwise).

Comments welcome!

-- 
-Drew Northup N1XIM
   AKA RvnPhnx on OPN
________________________________________________
"As opposed to vegetable or mineral error?"
-John Pescatore, SANS NewsBites Vol. 12 Num. 59

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html