Re: [PATCH 00/22] Refactor to accept NUL in commit messages

Jeff King <peff@xxxxxxxx> · Mon, 24 Oct 2011 15:45:58 -0700

On Sun, Oct 23, 2011 at 09:40:51PM -0700, Junio C Hamano wrote:

> >> But as Duy mentions, we have an encoding header. Shouldn't we treat it
> >> like binary goo until we do reencode_log_message, and _then_ we can
> >> break it into lines?
> >
> > That's sensible. If we go that route, I think the "one allocation of
> > separate struct commit_buffer pointed from a pointer field in struct
> > commit to replace the current member 'buffer'" is a reasonable thing
> > to do.
> 
> Having given that "sensible" comment, I am not convinced if this is worth
> it. We are talking about what is left in the ephemeral COMMIT_EDITMSG by
> the chosen editor, but are there really editors that can _only_ write in
> UTF-16 and not in UTF-8, and is it worth bending backwards to add support
> such an editor?

Couldn't you make the same argument about iso8859-1, or any other
encoding? The user has some encoding that they want to use, for whatever
reason[1]. We have a slot for an encoding header; is there a reason that
git would allow some encodings and not others?

I mean, besides the obvious that UTF-16 is annoying and contains
embedded NULs and newlines.

-Peff

[1] English is my first language, so it's rare for me to even step
outside of ASCII, let alone latin1. But aren't there some languages in
which utf-16 is more efficient than utf-8?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html