Hi, On Sun, 1 Feb 2009, Junio C Hamano wrote: > Johannes Schindelin <Johannes.Schindelin@xxxxxx> writes: > > > On Fri, 30 Jan 2009, Geoffrey Thomas wrote: > > > >> Currently, however, printf("%*.*s", width, width, author) is simply > >> wrong, because printf only cares about bytes, not screen columns. Do > >> you think I should fall back on the old behavior if > >> i18n.commitencoding is set, or if at least one of the author names > >> isn't parseable as UTF-8, or something? Or should I be doing this > >> with iconv and assuming all commits are encoded in the current > >> encoding specified via $LANG or $LC_whatever? > > > > I do not know what encoding the author is at that point, but if you > > cannot be sure that it is UTF-8, using utf8_strwidth() is just as > > wrong as the current code, IMHO. > > That is true, but then we are not losing anything. > > This codepath is not about the payload (the contents of the files) but > the author name part of the commit log message, and UTF-8 would probably > be the only sensible encoding to standardize on. Almost agree, except for shops where you have an enforced encoding that you cannot easily change. And last time I checked, many more encodings used 1 character/byte (or for that matter, 1 column / byte) than not; utf8_width would be "more wrong" than strlen() here, because strlen() would "happen to work" here. > If your project uses UTF-8 for everybody, great, we will align them > better than we did before. If not, sorry, you will get a different > misaligned names. There _has_ to be a way to check if the current author string is encoded in UTF-8. All I am asking is that the original poster would put just a _little_ more effort into the issue and make the thing dependent on the knowledge -- as opposed to the assumption -- that the author is encoded in UTF-8. > That assumes utf8_width() does not barf when fed an invalid byte > sequence, but I did not think it is that fragile (I didn't actually > audit the codepath, though). That is the code that barfs in wcwidth: if (ch < 32 || (ch >= 0x7f && ch < 0xa0)) return -1; That is not a big problem, but Geoff's code does not handle that case correctly. For example, in Code-page 437, a name like "£ïñûç" would result in a negative width. But hey, it is definitely not my itch, I will never suffer from the fallout of this patch, as I am safely within US-ASCII. I just thought I saw a potential problem and a possible way out. That's it. Ciao, Dscho