Re: [PATCH v2 1/1] diff.c: When appropriate, use utf8_strwidth()

Junio C Hamano <gitster@xxxxxxxxx> · Mon, 29 Aug 2022 11:37:05 -0700

Torsten Bögershausen <tboegi@xxxxxx> writes:

> This is caused by the logic in diff.c:
>   /*
>    * Find the longest filename and max number of changes
>    */
>    for (i = 0; (i < count) && (i < data->nr); i++) {
>        struct diffstat_file *file = data->files[i];
>        [snip]
>        len = utf8_strwidth(file->print_name);
>        if (max_width < len)
>           max_width = len;
> // and later
>     /*
>      * From here name_width is the width of the name area,
>      * and graph_width is the width of the graph area.
>      * max_change is used to scale graph properly.
>      */
>     for (i = 0; i < count; i++) {
>     /*
>      * "scale" the filename
>      */
>      // TB: Which means either shortening it with ...
>      // Or padding it, if needed, and here we need
>      // another
>      name_len = utf8_strwidth(name);
>
>>
>> Besides, since the simple change from `strlen()` to `utf8_strwidth()` is
>> so different from changing `strbuf_addf(...)`, I would prefer to see them
>> split into two patches.
>
> Hm, that is a possiblity. Seems to ease the burden for reviewers.

Another thing I remembered (this is a comment primarily on the
original I wrote based on 'all world is ASCII' mindset that led to
the use of strlen() as a display-width indicator) in the code is
that we "abbreviate" an overly long pathname and transform renames
that originally is in the a/b/c -> a/B/c form into a/{b->B}/c form,
and IIRC they are all byte based.  The latter may be OK because the
transformation is limited to '/' boundary, but the former may chomp
a single multi-byte letter in the middle, which would need to be
corrected as a part of this change.