On Wed, Feb 28, 2018 at 09:42:27AM -0800, Junio C Hamano wrote: > > I also think we'd want a plan for this to be used consistently in other > > diff-like tools. E.g., "git blame" uses textconv for the starting file > > content, and it would be nice for this to kick in then, too. Ditto for > > things like grep, pickaxe, etc. > > You probably do not want to limit your thinking to the generation > side. It is entirely plausible to have "we deal with diff in this > encoding X" in addition to "the in-repo encoding for this project is > this encoding Y" and "the working tree encoding for this path is Z" > and allow them to interact in "git diff | git apply" pipeline. > > "diff/format-patch --stdout/etc." on the upstream would first iconv > Y to X and feed the contents in X to xdiff machinery, which is sent > down the pipe and received by apply, which reads the preimage from > the disk or from the repository. If doing "apply" without > "--cached/--index", the preimage data from the disk would go through > iconv Z to X. If doing "apply --cached/--index", the preimage data > from the repo would go through iconv Y to X. The incoming patch is > in X, so we apply, and the resulting postimage will be re-encoded in > Z in the working tree and Y in the repository. I agree that would be convenient, but I have to wonder if all the complexity is worth it to maintain the idea of a distinct in-repo representation. It seems like it would open up a ton of corner cases. And I suspect most people would be happy enough with either a clean/smudge style worktree conversion or a textconv-style view. So if somebody wants to work on it, I don't want to stop them. But I think there's room for the simpler solutions in the meantime. -Peff