On Sun, Oct 23, 2011 at 09:40:51PM -0700, Junio C Hamano wrote: > >> But as Duy mentions, we have an encoding header. Shouldn't we treat it > >> like binary goo until we do reencode_log_message, and _then_ we can > >> break it into lines? > > > > That's sensible. If we go that route, I think the "one allocation of > > separate struct commit_buffer pointed from a pointer field in struct > > commit to replace the current member 'buffer'" is a reasonable thing > > to do. > > Having given that "sensible" comment, I am not convinced if this is worth > it. We are talking about what is left in the ephemeral COMMIT_EDITMSG by > the chosen editor, but are there really editors that can _only_ write in > UTF-16 and not in UTF-8, and is it worth bending backwards to add support > such an editor? Couldn't you make the same argument about iso8859-1, or any other encoding? The user has some encoding that they want to use, for whatever reason[1]. We have a slot for an encoding header; is there a reason that git would allow some encodings and not others? I mean, besides the obvious that UTF-16 is annoying and contains embedded NULs and newlines. -Peff [1] English is my first language, so it's rare for me to even step outside of ASCII, let alone latin1. But aren't there some languages in which utf-16 is more efficient than utf-8? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html