On 2018-02-28 14:21, Jeff King wrote: > On Wed, Feb 28, 2018 at 09:20:05AM +0100, Torsten Bögershausen wrote: > >>> 2. auto-detect utf-16 (your patch) >>> - Just Works for existing repositories storing utf-16 >>> >>> - carries some risk of kicking in when people would like it not to >>> (e.g., when they really do want a binary patch that can be >>> applied). >> >> The binary patch is still supported, but that detail may need some more explanation >> in the commit message. Please see t4066-diff-encoding.sh > > Yeah, but if you don't have binary-patches enabled we'd generate a bogus > patch. Which, granted, without that you wouldn't be able to apply the > patch either. But somehow it feels funny to me to generate something > that _looks_ like a patch but you can't actually apply. > > I also think we'd want a plan for this to be used consistently in other > diff-like tools. E.g., "git blame" uses textconv for the starting file > content, and it would be nice for this to kick in then, too. Ditto for > things like grep, pickaxe, etc. > > I have some patches that reuse some of the textconv infrastructure for > this, which should mostly make it "just work" everywhere. They need a > little more polishing before I post them, but you can take a look at: > > https://github.com/peff/git.git jk/textconv-utf16 > > if you want. > > -Peff > Thanks for your work (I actually found some time to take look) I am looking at the code to put 2 or 3 things on top of it: - test case(s) - documentation - teach diff to add a line "b is converted to UTF-8 from UTF-16" - teach apply to reads & understands the encoding line and throws in a "reencode_string_len() like your patch does This would keep "git diff | git apply" happy. All in all the changes do not look too invasive, at least from my point of view.