Re: [PATCH 4/4] Only re-encode certain parts in commit object, not the whole

Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> · Wed, 22 Feb 2012 09:01:20 +0700



2012/2/22 Jeff King <peff@xxxxxxxx>:
> On Tue, Feb 21, 2012 at 09:24:52PM +0700, Nguyen Thai Ngoc Duy wrote:
>
>> Commit object has its own format, which happens to be in ascii, but
>> not really subject to re-encoding.
>>
>> There are only four areas that may be re-encoded: author line,
>> committer line, mergetag lines and commit body.  Encoding of tags
>> embedded in mergetag lines is not decided by commit encoding, so leave
>> it out and consider it binary.
>
> Is this worth the effort? Yes, re-encoding the ASCII bits of the commit
> object is unnecessary. But do we actually handle encodings that are not
> ASCII supersets? IOW, I could see the point if this is making it
> possible to hold utf-16 names and messages in your commits (though why
> you would want to do so is beyond me...). But my understanding is that
> this is horribly broken anyway by other parts of the code. And even
> looking at your code below:

No, utf-16 and friends are out of question. 617/1168 supported
encodings in iconv translate chars 10,32-126 to something else, some
of them does not generate NUL. I suppose none of these are actually
used nowadays. Looking again, some don't even successfully translate
the given input. No, it's probably not worth the effort.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html