On Tue, Feb 21, 2012 at 09:24:52PM +0700, Nguyen Thai Ngoc Duy wrote: > Commit object has its own format, which happens to be in ascii, but > not really subject to re-encoding. > > There are only four areas that may be re-encoded: author line, > committer line, mergetag lines and commit body. Encoding of tags > embedded in mergetag lines is not decided by commit encoding, so leave > it out and consider it binary. Is this worth the effort? Yes, re-encoding the ASCII bits of the commit object is unnecessary. But do we actually handle encodings that are not ASCII supersets? IOW, I could see the point if this is making it possible to hold utf-16 names and messages in your commits (though why you would want to do so is beyond me...). But my understanding is that this is horribly broken anyway by other parts of the code. And even looking at your code below: > +static char *reencode_commit(const char *buffer, > + const char *out_enc, const char *in_enc) > +{ > + struct strbuf out = STRBUF_INIT; > + struct strbuf buf = STRBUF_INIT; > + char *reencoded, *s, *e; > + > + strbuf_addstr(&buf, buffer); > + > + s = strstr(buf.buf, "\nauthor "); > + assert(s != NULL); Wouldn't this assert trigger in the presence of encodings which contain ASCII NUL (e.g., wide encodings like utf-16)? Is there an encoding you have in mind which would be helped by this? -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html