Re: [PATCH] builtin-blame.c: Use utf8_strwidth for author's names

Junio C Hamano <gitster@xxxxxxxxx> · Mon, 02 Feb 2009 20:30:30 -0800

Johannes Schindelin <Johannes.Schindelin@xxxxxx> writes:

> And last time I checked, many more encodings used 1 character/byte (or for 
> that matter, 1 column / byte) than not; utf8_width would be "more wrong" 
> than strlen() here, because strlen() would "happen to work" here.

Ahh, you are absolutely right here, and use of utf8_width without checking
is actively breaking things.

> There _has_ to be a way to check if the current author string is encoded 
> in UTF-8.  All I am asking is that the original poster would put just a 
> _little_ more effort into the issue and make the thing dependent on the 
> knowledge -- as opposed to the assumption -- that the author is encoded in 
> UTF-8.

Yeah, that makes sense.

> That is the code that barfs in wcwidth:
>
>         if (ch < 32 || (ch >= 0x7f && ch < 0xa0))
>                 return -1;
>
> That is not a big problem, but Geoff's code does not handle that case 
> correctly.

Thanks for checking --- I suspected something like that would be there
somewhere.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html