On Mon, Aug 29, 2022 at 02:04:42PM +0200, Johannes Schindelin wrote: > Hi Torsten, > > > > The choosen solution is to split code in diff.c like this > > > > strbuf_addf(&out, "%-*s", len, name); > > > > into something like this: > > > > size_t num_padding_spaces = 0; > > // [snip] > > if (len > utf8_strwidth(name)) > > num_padding_spaces = len - utf8_strwidth(name); > > strbuf_addf(&out, "%s", name); > > if (num_padding_spaces) > > strbuf_addchars(&out, ' ', num_padding_spaces); > > ... this sounds like it would benefit from beinv refactored into a > separate function, e.g. `strbuf_add_padded(buf, utf8string)`, both for > readability as well as for self-documentation. Yes, but: All (tm) strbuf() functions use an unsigned size_t, and are not tolerant against passing 0 as "do nothing". A nicer solution (for this patch) could be a change like this: Instead of void strbuf_addchars(struct strbuf *sb, int c, size_t n) { strbuf_grow(sb, n); memset(sb->buf + sb->len, c, n); strbuf_setlen(sb, sb->len + n); } We would find: void strbuf_addchars(struct strbuf *sb, int c, ssize_t n) { if (n <= 0) return; strbuf_grow(sb, (size_t)n); memset(sb->buf + sb->len, c, (size_t)n); strbuf_setlen(sb, sb->len + (size_t)n); } I couldn't convince myself to do so. Since it is mainly diff.c that needs this adjustment/padding of strings, I coulnd't convince myself to write another function in strbuf.c > > Also, it is unclear to me why we have to evaluate `utf8_strwidth()` > _twice_ and why we do not assign the result to a variable called `width` > and then have a conditional like > > if (width < len) /* pad to `len` columns */ > strbuf_addchars(&out, ' ' , len - width); > > instead. That would sound more logical to me. This is caused by the logic in diff.c: /* * Find the longest filename and max number of changes */ for (i = 0; (i < count) && (i < data->nr); i++) { struct diffstat_file *file = data->files[i]; [snip] len = utf8_strwidth(file->print_name); if (max_width < len) max_width = len; // and later /* * From here name_width is the width of the name area, * and graph_width is the width of the graph area. * max_change is used to scale graph properly. */ for (i = 0; i < count; i++) { /* * "scale" the filename */ // TB: Which means either shortening it with ... // Or padding it, if needed, and here we need // another name_len = utf8_strwidth(name); > > Besides, since the simple change from `strlen()` to `utf8_strwidth()` is > so different from changing `strbuf_addf(...)`, I would prefer to see them > split into two patches. Hm, that is a possiblity. Seems to ease the burden for reviewers. > > > > > Tests: > > Two things need to be tested: > > - The calculation of the maximum width > > - The calculation of num_padding_spaces > > > > The name "textfile" is changed into "textfilë", both have a width of 8. > > If strlen() was used, to get the maximum width, the shorter "binfile" would > > have been mis-aligned: > > binfile | [snip] > > textfilë | [snip] > > > > If only "binfile" would be renamed into "binfilë": > > binfilë | [snip] > > textfile | [snip] > > > > In order to verify that the width is calculated correctly everywhere, > > "binfile" is renamed into "binfïlë", giving 2 bytes more in strlen() > > "textfile" is renamed into "textfilë", 1 byte more in strlen(), > > and the updated t4012-diff-binary.sh checks the correct aligment: > > binfïlë | [snip] > > textfilë | [snip] > > I wonder whether you can change only _one_ name and still verify the > correctness. When you make two changes at the same time, it is always > possible for one change to "cancel out" the other one, and therefore it is > harder to reason about the correctness of your patch. Nee, I have a hard time to see how a +/- 1 can "cancel out" a +/- 2. But I may improve the commit message, to make that more clear. > > Better keep it simple and change only one instance (personally, > I would have changed two letters in the longer one). That is certainly doable. > > > > > Reported-by: Alexander Meshcheryakov <alexander.s.m@xxxxxxxxx> > > Signed-off-by: Torsten Bögershausen <tboegi@xxxxxx> > > --- > > diff.c | 37 +++++++++++++++++++++++-------------- > > t/t4012-diff-binary.sh | 14 +++++++------- > > 2 files changed, 30 insertions(+), 21 deletions(-) > > > > diff --git a/diff.c b/diff.c > > index 974626a621..cf38e1dc88 100644 > > --- a/diff.c > > +++ b/diff.c > > @@ -2591,7 +2591,7 @@ void print_stat_summary(FILE *fp, int files, > > static void show_stats(struct diffstat_t *data, struct diff_options *options) > > { > > int i, len, add, del, adds = 0, dels = 0; > > - uintmax_t max_change = 0, max_len = 0; > > + uintmax_t max_change = 0, max_width = 0; > > Why rename `max_len`, but not `len`? I would have expected (and agreed > with seeing) `len` to be renamed to `width`, too. That is a valid point. There is, however, already a variable called "width". And renaming this one into a new one as well ? > > > int total_files = data->nr, count; > > int width, name_width, graph_width, number_width = 0, bin_width = 0; > > const char *reset, *add_c, *del_c; > > @@ -2620,9 +2620,9 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options) > > continue; > > } > > fill_print_name(file); > > - len = strlen(file->print_name); > > - if (max_len < len) > > - max_len = len; > > + len = utf8_strwidth(file->print_name); > > + if (max_width < len) > > + max_width = len; > > > > if (file->is_unmerged) { > > /* "Unmerged" is 8 characters */ > > @@ -2646,7 +2646,7 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options) > > > > /* > > * We have width = stat_width or term_columns() columns total. > > - * We want a maximum of min(max_len, stat_name_width) for the name part. > > + * We want a maximum of min(max_width, stat_name_width) for the name part. > > * We want a maximum of min(max_change, stat_graph_width) for the +- part. > > * We also need 1 for " " and 4 + decimal_width(max_change) > > * for " | NNNN " and one the empty column at the end, altogether > > @@ -2701,8 +2701,8 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options) > > graph_width = options->stat_graph_width; > > > > name_width = (options->stat_name_width > 0 && > > - options->stat_name_width < max_len) ? > > - options->stat_name_width : max_len; > > + options->stat_name_width < max_width) ? > > + options->stat_name_width : max_width; > > It is a bit sad that the diff lines regarding the renamed variable drown > out the actual change (`strlen()` -> `utf8_strwidth()`). But the end > result is nicer. > > Thank you for working on this! > Dscho Thanks so much for the review - let's see if I can make a better patch the next days (better say weeks)