2012/2/22 Jeff King <peff@xxxxxxxx>: > On Tue, Feb 21, 2012 at 09:24:52PM +0700, Nguyen Thai Ngoc Duy wrote: > >> Commit object has its own format, which happens to be in ascii, but >> not really subject to re-encoding. >> >> There are only four areas that may be re-encoded: author line, >> committer line, mergetag lines and commit body. Encoding of tags >> embedded in mergetag lines is not decided by commit encoding, so leave >> it out and consider it binary. > > Is this worth the effort? Yes, re-encoding the ASCII bits of the commit > object is unnecessary. But do we actually handle encodings that are not > ASCII supersets? IOW, I could see the point if this is making it > possible to hold utf-16 names and messages in your commits (though why > you would want to do so is beyond me...). But my understanding is that > this is horribly broken anyway by other parts of the code. And even > looking at your code below: No, utf-16 and friends are out of question. 617/1168 supported encodings in iconv translate chars 10,32-126 to something else, some of them does not generate NUL. I suppose none of these are actually used nowadays. Looking again, some don't even successfully translate the given input. No, it's probably not worth the effort. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html