Re: [PATCH v7 0/7] convert: add support for different encodings

Jeff King <peff@xxxxxxxx> · Thu, 1 Mar 2018 02:49:19 -0500

On Wed, Feb 28, 2018 at 09:42:27AM -0800, Junio C Hamano wrote:

> > I also think we'd want a plan for this to be used consistently in other
> > diff-like tools. E.g., "git blame" uses textconv for the starting file
> > content, and it would be nice for this to kick in then, too. Ditto for
> > things like grep, pickaxe, etc.
> 
> You probably do not want to limit your thinking to the generation
> side.  It is entirely plausible to have "we deal with diff in this
> encoding X" in addition to "the in-repo encoding for this project is
> this encoding Y" and "the working tree encoding for this path is Z"
> and allow them to interact in "git diff | git apply" pipeline.
> 
> "diff/format-patch --stdout/etc." on the upstream would first iconv
> Y to X and feed the contents in X to xdiff machinery, which is sent
> down the pipe and received by apply, which reads the preimage from
> the disk or from the repository.  If doing "apply" without
> "--cached/--index", the preimage data from the disk would go through
> iconv Z to X.  If doing "apply --cached/--index", the preimage data
> from the repo would go through iconv Y to X.  The incoming patch is
> in X, so we apply, and the resulting postimage will be re-encoded in
> Z in the working tree and Y in the repository.

I agree that would be convenient, but I have to wonder if all the
complexity is worth it to maintain the idea of a distinct in-repo
representation. It seems like it would open up a ton of corner cases.
And I suspect most people would be happy enough with either a
clean/smudge style worktree conversion or a textconv-style view.

So if somebody wants to work on it, I don't want to stop them. But I
think there's room for the simpler solutions in the meantime.

-Peff