Re: [PATCH] builtin-blame.c: Use utf8_strwidth for author's names

Junio C Hamano <gitster@xxxxxxxxx> · Sun, 01 Feb 2009 22:48:51 -0800

Johannes Schindelin <Johannes.Schindelin@xxxxxx> writes:

> On Fri, 30 Jan 2009, Geoffrey Thomas wrote:
>
>> Currently, however, printf("%*.*s", width, width, author) is simply 
>> wrong, because printf only cares about bytes, not screen columns. Do you 
>> think I should fall back on the old behavior if i18n.commitencoding is 
>> set, or if at least one of the author names isn't parseable as UTF-8, or 
>> something? Or should I be doing this with iconv and assuming all commits 
>> are encoded in the current encoding specified via $LANG or $LC_whatever?
>
> I do not know what encoding the author is at that point, but if you cannot 
> be sure that it is UTF-8, using utf8_strwidth() is just as wrong as the 
> current code, IMHO.

That is true, but then we are not losing anything.

This codepath is not about the payload (the contents of the files) but the
author name part of the commit log message, and UTF-8 would probably be
the only sensible encoding to standardize on.

If your project uses UTF-8 for everybody, great, we will align them better
than we did before.  If not, sorry, you will get a different misaligned
names.

That assumes utf8_width() does not barf when fed an invalid byte sequence,
but I did not think it is that fragile (I didn't actually audit the
codepath, though).

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html