On Thu, Oct 27, 2011 at 05:03:29PM -0700, Junio C Hamano wrote: > Jeff King <peff@xxxxxxxx> writes: > > > My interest is to make things like bare-repository diff (and everything > > built on it; i.e., things like github, gitweb, or whatever) do the sane > > thing for these people, even if I think what they're doing is wrong. > > I do not think we are talking about right or wrong. I was primarily saying > that textconv may not be the right thing (think github/gitweb showing blob > contents, nicely formatted inside the chrome the site provides). But I think it is probably a wrong thing to store utf-16 as the canonical format inside the git repository. Git simply can't handle it for diffing. And the right thing, as you suggested, is clean/smudge. But I'm dealing with repositories on the server side, where it is too late to do clean/smudge; I just get whatever junk people commited. > We have in-repository representation that diff and grep and friends work > on, and output conversion layer that externalizes the result of them in > the form of "smudge". Another layer above the in-repository representation > and below operations could convert UTF-16 to UTF-8 when going outward and > in the opposite when going inward. I'm not sure that could sanely be done in a backwards compatible way. Doing it with just textual diffs is a hack, of course, but at least we know that the damage is limited, and the diff we generate on top doesn't care that much about the original sha1s[1]. But should read_object_sha1 learn to convert utf-16 into utf-8? I think madness lies that way, as we are breaking assumptions about sha1 validity. -Peff [1] Actually, the text diff does mention the original and resulting sha1s, which would now either bear no relation to the diff text, or bear no relation to what's in the repo. Either way, I think we are creating something that can't necessarily be applied, which is bad. And is why I thought of textconv, which is basically the same concept (and has the same problems). -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html