On Mon, Feb 26, 2018 at 03:46:35PM -0500, Jeff King wrote: > On Mon, Feb 26, 2018 at 06:35:33PM +0100, Torsten Bögershausen wrote: > > > > diff --git a/userdiff.c b/userdiff.c > > > index dbfb4e13cd..48fa7e8bdd 100644 > > > --- a/userdiff.c > > > +++ b/userdiff.c > > > @@ -161,6 +161,7 @@ IPATTERN("css", > > > "-?[_a-zA-Z][-_a-zA-Z0-9]*" /* identifiers */ > > > "|-?[0-9]+|\\#[0-9a-fA-F]+" /* numbers */ > > > ), > > > +{ "utf16", NULL, -1, { NULL, 0 }, NULL, "iconv:utf16" }, > > > { "default", NULL, -1, { NULL, 0 } }, > > > }; > > > #undef PATTERNS > > > > The patch looks like a possible step into the right direction - > > some minor notes: "utf8" is better written as "UTF-8", when talking > > to iconv.h, same for utf16. > > > > But, how do I activate the diff ? > > I have in .gitattributes > > XXXenglish.txt diff=UTF-16 > > > > and in .git/config > > [diff "UTF-16"] > > command = iconv:UTF-16 > > > > > > What am I doing wrong ? > > After applying the patch, if I do: > > git init > echo hello | iconv -f utf8 -t utf16 >file > git add file > git commit -m one > echo goodbye | iconv -f utf8 -t utf16 >file > git add file > git commit -m two > > then: > > git log -p > > shows "binary files differ" but: > > echo "file diff=utf16" >.gitattributes > git log -p > > shows text diffs. I assume you tweaked the patch before switching to > the UTF-16 spelling in your example. Did you use a plumbing command to > show the diff? textconv isn't enabled for plumbing, because the > resulting patches cannot actually be applied (in that sense an encoding > switch is potentially special, since in theory one could convert to the > canonical text format, apply the patch, and then convert back). > > -Peff Thanks for helping me out. I didn't use "git log -p", but a simple "git diff". (And after re-using utf16 with lowercase, it works as you described it) I wasn't aware of "git log -p", something learned (or re-learned) The other question is: Would this help showing diffs of UTF-16 encoded files on a "git hoster", github/bitbucket/.... ? Or would the auto-magic UTF-16 avoid binary patch that I send out be more helpful ? Or both ? Or the w-t-e encoding ? Questions over questions.