On Wed, Aug 10, 2022 at 08:53:28AM -0700, Junio C Hamano wrote: > Torsten Bögershausen <tboegi@xxxxxx> writes: > > > git log --stat > > [snip] > > Arger.txt | 1 + > > Ärger.txt | 1 + > > 2 files changed, 2 insertions(+) > > > > From this very first experiment I would suspect that we use > > strlen() somewhere rather then utf8.c::git_gcwidth() > > Yeah, that does sound like the case, and quite honestly, knowing > that the diffstat code is way older than unicode-width code, which > was added by you in mid 2014, I am not all that surprised if we used > to use strlen() throughout and we still do by mistake. > > Thanks for a doze of sanity. Some 2 updates here: - The strlen() needs a replacement. It looks as if the following patch helps: /* somewhere in diff.c */ static size_t screen_utf8_width(const char *start) { const char *cp = start; size_t remain = strlen(start); size_t width = 0; while (remain) { int n = utf8_width(&cp, &remain); if (n < 0) return strlen(start); /* not UTF-8 ? Use strlen() */ width += n; } return width; } @@ -2620,7 +2635,7 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options) continue; } fill_print_name(file); - len = strlen(file->print_name); + len = screen_utf8_width(file->print_name); if (max_len < len) max_len = len; @@ -2743,7 +2758,7 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options) * "scale" the filename */ len = name_width; - name_len = strlen(name); + name_len = screen_utf8_width(name); if (name_width < name_len) { ===================================== Let's see if I can make a proper patch out of it. The second problem, and I hoped it wasn't, seems to be related to what you had digged out earlier. >Sounds like a symptom observable when the width computed by >utf8.c::git_gcwidth(), using the width table imported from >unicode.org, and the width the terminal thinks each of the displayed >character has, do not match (e.g. seen when ambiguous characters are >involved, https://unicode.org/reports/tr11/#Ambiguous). That needs a second patch, probably after some more digging, how unicode is rendedered on the different systems