Re: [PATCH 12/29] bisect--helper: reimplement `bisect_next` and `bisect_auto_next` shell functions in C

Johannes Schindelin <Johannes.Schindelin@xxxxxx> · Mon, 17 Feb 2020 23:00:55 +0100 (CET)

Hi Chris,

On Mon, 17 Feb 2020, Christian Couder wrote:

> On Thu, Jan 30, 2020 at 11:47 PM Johannes Schindelin
> <Johannes.Schindelin@xxxxxx> wrote:
> >
> > On Mon, 20 Jan 2020, Miriam Rubio wrote:
>
> > > +static void prepare_rev_argv(struct bisect_terms *terms, struct argv_array *rev_argv)> > +{
> > > +     struct string_list good_revs = STRING_LIST_INIT_DUP;
> > > +     char *term_good = xstrfmt("%s-*", terms->term_good);
> > > +
> > > +     for_each_glob_ref_in(register_good_ref, term_good,
> > > +                          "refs/bisect/", &good_revs);
> > > +
> > > +     argv_array_pushl(rev_argv, "skipped_commits", "refs/bisect/bad", "--not", NULL);
> > > +     for (int i = 0; i < good_revs.nr; i++)
> > > +             argv_array_push(rev_argv, good_revs.items[i].string);
> > > +
> > > +     string_list_clear(&good_revs, 0);
> > > +     free(term_good);
> > > +}
> >
> > Maybe we should fold that into `prepare_revs()`? We could then render the
> > arguments directly into `revs` (via `add_pending_object()`, after setting
> > obj->flags |= UNINTERESTING`) rather than formatting them into a string
> > list, then deep-copy them into an `argv_array` only to parse them back
> > into OIDs that we already had in the first place.
>
> The current code is a straightforward port from shell. If we do what
> you suggest, yeah, it will be less wasteful, but on the other hand it
> will be less easy to see that we are doing a good job of properly
> porting code from shell.

If you reason that way, you will have to use tons of `run_command()` calls
to translate the shell code as verbatim as possible.

However, as you can see from our commit history, we do not do that.
Instead, we use the more powerful expressiveness of C to come up with more
elegant code than to slavishly convert shell code to inelegant C code.

> I suggest we try to focus on the later rather than the former right now,
> especially as performance is not very important here.

Oh, but my comment was totally not about performance, and pretty much
exclusively about readability.

If Miriam goes with my suggestion, it will result in more readable code
that is easier to review and therefore much more likely to be free of
unintentional bugs.

> Using small functions also makes it easy to see that we are properly
> releasing memory. A previous version of this code had everything into
> a big function that used goto statements and it was less clear that we
> released everything.

If you want to make it easier to avoid double-free()s and memory leaks, I
am a bit puzzled how you want to claim that the current "we're copying the
strings so often that pretty much everybody loses track of them" approach
should be superior to adding the strings once, and once only, to a string
array.

> > > +static int bisect_skipped_commits(struct bisect_terms *terms)
> > > +{
> > > +     int res = 0;
> > > +     FILE *fp = NULL;
> > > +     struct rev_info revs;
> > > +
> > > +     fp = fopen(git_path_bisect_log(), "a");
> > > +     if (!fp)
> > > +             return error_errno(_("could not open '%s' for appending"),
> > > +                               git_path_bisect_log());
> > > +
> > > +     res = prepare_revs(terms, &revs);
> > > +
> > > +     if (!res)
> > > +             res = process_skipped_commits(fp, terms, &revs);
> > > +
> > > +     fclose(fp);
> > > +     return res;
> > > +}
> >
> > This is again a very short wrapper around another function, so it will
> > probably make sense to merge the two, otherwise the boilerplate might very
> > well outweigh the actual code doing actual work.
>
> Yeah, there is perhaps a significant amount of boiler plate, but the
> code is much easier to check for leaks than when everything was in the
> same big function and there were goto statements, so I think it's a
> reasonable trade-off

Given this snippet, I would strongly disagree with this assessment:

    fp = fopen(git_path_bisect_log(), "a");
    if (!fp)
            res = error_errno(_("could not open '%s' for appending"),
                              git_path_bisect_log());
    else
            res = prepare_revs(terms, &revs);

    if (!res)
            res = process_skipped_commits(fp, terms, &revs);

    if (fp)
            fclose(fp);

There is positively no need for a `goto` whatsoever.

> > > +             fclose(fp);
> > > +     } else {
> > > +             res = error_errno(_("could not open '%s' for "
> > > +                                 "appending"),
> > > +                               git_path_bisect_log());
> > > +     }
> >
> > This pattern of opening a file, writing something into it, and then return
> > success, otherwise failure, seems like a repeated pattern. In other words,
> > it would be a good candidate for factoring out into its own function.
>
> Yeah, but it seems that in this patch series we use the pattern only
> once. So I think it's fair to leave that for another patch series with
> cleanups and performance improvements or perhaps for microprojects.

Sure, we could repeat past mistakes in this patch series, too.

If, on the other hand, we use this patch series as "an excuse" to
introduce such a helper, no future patch series will have to use the same
kind of argument as you just offered. Instead, we will have an improved
API that will help not only this patch series, but many more to come.

There is tons of precedent for this kind of thing, where we add an
introductory patch at the beginning of a patch series, factoring out
already-existing code into a more reusable shape, and then use it.

So why not repeat that pattern and do the same thing here?

Ciao,
Johannes