Re: [PATCH v14 14/27] bisect--helper: `bisect_next` and `bisect_auto_next` shell function in C

Pranit Bauva <pranit.bauva@xxxxxxxxx> · Tue, 30 Aug 2016 23:55:40 +0530

Hey Junio,

Sorry for a late replay.

On Fri, Aug 26, 2016 at 2:00 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> Pranit Bauva <pranit.bauva@xxxxxxxxx> writes:
>
>> A lot of parts of bisect.c uses exit() and these signals are then
>> trapped in the `bisect_start` function. Since the shell script ceases
>> its existence it would be necessary to convert those exit() calls to
>> return statements so that errors can be reported efficiently in C code.
>
> Is efficiency really an issue?  I think the real reason is that it
> would make it impossible for the callers to handle errors, if you do
> not convert and let the error codepaths exit().

I think I put the word "efficiently" wrongly over here. Will omit it.

>> @@ -729,7 +735,7 @@ static struct commit **get_bad_and_good_commits(int *rev_nr)
>>       return rev;
>>  }
>>
>> -static void handle_bad_merge_base(void)
>> +static int handle_bad_merge_base(void)
>>  {
>>       if (is_expected_rev(current_bad_oid)) {
>>               char *bad_hex = oid_to_hex(current_bad_oid);
>> @@ -750,17 +756,18 @@ static void handle_bad_merge_base(void)
>>                               "between %s and [%s].\n"),
>>                               bad_hex, term_bad, term_good, bad_hex, good_hex);
>>               }
>> -             exit(3);
>> +             return 3;
>>       }
>>
>>       fprintf(stderr, _("Some %s revs are not ancestor of the %s rev.\n"
>>               "git bisect cannot work properly in this case.\n"
>>               "Maybe you mistook %s and %s revs?\n"),
>>               term_good, term_bad, term_good, term_bad);
>> -     exit(1);
>> +     bisect_clean_state();
>> +     return 1;
>
> What is the logic behind this function sometimes clean the state,
> and some other times do not, when it makes an error-return?  We see
> above that "return 3" codepath leaves the state behind.
>
> Either you forgot a necessary clean_state in "return 3" codepath,
> or you forgot to document why the distinction exists in the in-code
> comment for the function.  I cannot tell which, but I am leaning
> towards guessing that it is the former.

This is a very tricky one. I have purposely not included this after a
lot of testing. I have hand tested with the original git and with this
branch. The reason why anyone wouldn't be able to catch this is
because its not covered in the test suite. I am including a patch with
this as an attachment (because I am behind a proxy right now but don't
worry I will include this as a commit in the next series). The
original behaviour of git does not clean the bisect state when this
situation occurs. On another note which you might have missed that
bisect_clean_state() is purposely put before return 1 which is covered
by the test suite. You can try removing it and see that there is a
broken tes. tI was thinking of including the tests after the whole
conversion but now I think including this before will make the
conversion more easier for review.

>> -static void check_good_are_ancestors_of_bad(const char *prefix, int no_checkout)
>> +static int check_good_are_ancestors_of_bad(const char *prefix, int no_checkout)
>>  {
>>       char *filename = git_pathdup("BISECT_ANCESTORS_OK");
>>       struct stat st;
>> -     int fd;
>> +     int fd, res = 0;
>>
>>       if (!current_bad_oid)
>>               die(_("a %s revision is needed"), term_bad);
>
> Can you let it die yere?

Not really. I should change it to return error().

>> @@ -873,8 +890,11 @@ static void check_good_are_ancestors_of_bad(const char *prefix, int no_checkout)
>>                             filename);
>>       else
>>               close(fd);
>> +
>> +     return 0;
>>   done:
>>       free(filename);
>> +     return 0;
>>  }
>
> Who owns "filename"?  The first "return 0" leaves it unfreed, and
> when "goto done" is done, it is freed.
>
> The above two may indicate that "perhaps 'retval + goto finish'
> pattern?" is a really relevant suggestion for the earlier steps in
> this series.

Yes.

>>       if (!all) {
>>               fprintf(stderr, _("No testable commit found.\n"
>>                       "Maybe you started with bad path parameters?\n"));
>> -             exit(4);
>> +             return 4;
>>       }
>>
>>       bisect_rev = revs.commits->item->object.oid.hash;
>>
>>       if (!hashcmp(bisect_rev, current_bad_oid->hash)) {
>> -             exit_if_skipped_commits(tried, current_bad_oid);
>> +             res = exit_if_skipped_commits(tried, current_bad_oid);
>> +             if (res)
>> +                     return res;
>> +
>>               printf("%s is the first %s commit\n", sha1_to_hex(bisect_rev),
>>                       term_bad);
>>               show_diff_tree(prefix, revs.commits->item);
>>               /* This means the bisection process succeeded. */
>> -             exit(10);
>> +             return 10;
>>       }
>>
>>       nr = all - reaches - 1;
>> @@ -1005,7 +1033,11 @@ int bisect_next_all(const char *prefix, int no_checkout)
>>                 "Bisecting: %d revisions left to test after this %s\n",
>>                 nr), nr, steps_msg);
>>
>> -     return bisect_checkout(bisect_rev, no_checkout);
>> +     res = bisect_checkout(bisect_rev, no_checkout);
>> +     if (res)
>> +             bisect_clean_state();
>> +
>> +     return res;
>>  }
>
> There were tons of "exit_if" that was converted to "if (res) return
> res" above, instead of jumping here to cause clean_state to be
> called.  I cannot tell if this new call to clean_state() is wrong,
> or all the earlier "return res" should come here.  I am guessing the
> latter.

No I don't think its wrong. It is advised in the comment
    In case of mistaken revs or checkout error, or signals received,
    "bisect_auto_next" below may exit or misbehave.
    We have to trap this to be able to clean up using
    "bisect_clean_state".

to clean the bisection state *iff checkout fails* otherwise not.
Luckily this is already covered by the test suite. I think what you
meant is that bisect_clean_state() might be covered elsewhere too then
why perform cleanup here, right? bisect_clean_state() have been
carefully put where ever required only.

>> diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c
>> index c64996a..ef7b3a1 100644
>> --- a/builtin/bisect--helper.c
>> +++ b/builtin/bisect--helper.c
>> @@ -8,6 +8,7 @@
>>  #include "run-command.h"
>>  #include "prompt.h"
>>  #include "quote.h"
>> +#include "revision.h"
>>
>>  static GIT_PATH_FUNC(git_path_bisect_terms, "BISECT_TERMS")
>>  static GIT_PATH_FUNC(git_path_bisect_expected_rev, "BISECT_EXPECTED_REV")
>> @@ -29,6 +30,8 @@ static const char * const git_bisect_helper_usage[] = {
>>       N_("git bisect--helper --bisect-terms [--term-good | --term-old | --term-bad | --term-new]"),
>>       N_("git bisect--helper --bisect start [--term-{old,good}=<term> --term-{new,bad}=<term>]"
>>                                             "[--no-checkout] [<bad> [<good>...]] [--] [<paths>...]"),
>> +     N_("git bisect--helper --bisect-next"),
>> +     N_("git bisect--helper --bisect-auto-next"),
>>       NULL
>>  };
>>
>> @@ -396,6 +399,129 @@ static int bisect_terms(struct bisect_terms *terms, const char **argv, int argc)
>>       return 0;
>>  }
>>
>> +static int register_good_ref(const char *refname,
>> +                          const struct object_id *oid, int flags,
>> +                          void *cb_data)
>> +{
>> +     struct string_list *good_refs = cb_data;
>> +     string_list_append(good_refs, oid_to_hex(oid));
>> +     return 0;
>> +}
>> +
>> +static int bisect_next(struct bisect_terms *terms, const char *prefix)
>> +{
>> +     int res, no_checkout;
>> +
>> +     /* In case of mistaken revs or checkout error, or signals received,
>
> That's an unbalanced comment.  You end the block with "*/" on its
> own line, so you should start the block with "/*" on its own line.
> There seems to be at least one more such comment in this patch but I
> won't repeat.

I shall change this.

>> +      * "bisect_auto_next" below may exit or misbehave.
>> +      * We have to trap this to be able to clean up using
>> +      * "bisect_clean_state".
>> +      */
>
> "exit" meaning "call exit() to terminate the process", or something
> else?  If the latter, don't say "exit", but say "return error".

Yes I think its necessary to change "exit" in comments to "return".
There are other places too in which I will have to change.

>> +     if (bisect_next_check(terms, terms->term_good.buf))
>> +             return -1;
>
> Mental note.  The "autostart" in the original is gone.  Perhaps it
> is done by next_check in this code, but we haven't seen that yet.

This will be added back again in a coming patch[1].

>> +     no_checkout = !is_empty_or_missing_file(git_path_bisect_head());
>> +
>> +     /* Perform all bisection computation, display and checkout */
>> +     res = bisect_next_all(prefix , no_checkout);
>> +
>> +     if (res == 10) {
>> +             FILE *fp;
>> +             unsigned char sha1[20];
>> +             struct commit *commit;
>> +             struct pretty_print_context pp = {0};
>> +             struct strbuf commit_name = STRBUF_INIT;
>> +             char *bad_ref = xstrfmt("refs/bisect/%s",
>> +                                           terms->term_bad.buf);
>> +             read_ref(bad_ref, sha1);
>> +             commit = lookup_commit_reference(sha1);
>> +             format_commit_message(commit, "%s", &commit_name, &pp);
>> +             fp = fopen(git_path_bisect_log(), "a");
>> +             if (!fp) {
>> +                     free(bad_ref);
>> +                     strbuf_release(&commit_name);
>> +                     return -1;
>> +             }
>> +             if (fprintf(fp, "# first %s commit: [%s] %s\n",
>> +                         terms->term_bad.buf, sha1_to_hex(sha1),
>> +                         commit_name.buf) < 1){
>> +                     free(bad_ref);
>> +                     strbuf_release(&commit_name);
>> +                     fclose(fp);
>> +                     return -1;
>> +             }
>> +             free(bad_ref);
>> +             strbuf_release(&commit_name);
>> +             fclose(fp);
>> +             return 0;
>
> Doesn't it bother you that you have to write free(bad_ref)...fclose(fp)
> repeatedly?

My mind was more involved in the actual conversion and that was the
main part bothering me. I did all the "clean up" stuff after I did the
actual conversion. And since I was extremely happy after the porting
happened, the "cleanup stuff" didn't bother me much.

>> +     }
>> +     else if (res == 2) {
>> +             FILE *fp;
>> +             struct rev_info revs;
>> +             struct argv_array rev_argv = ARGV_ARRAY_INIT;
>> +             struct string_list good_revs = STRING_LIST_INIT_DUP;
>> +             struct pretty_print_context pp = {0};
>> +             struct commit *commit;
>> +             char *term_good = xstrfmt("%s-*", terms->term_good.buf);
>> +             int i;
>> +
>> +             fp = fopen(git_path_bisect_log(), "a");
>> +             if (!fp) {
>> +                     free(term_good);
>> +                     return -1;
>> +             }
>> +             if (fprintf(fp, "# only skipped commits left to test\n") < 1) {
>> +                     free(term_good);
>> +                     fclose(fp);
>> +                     return -1;
>> +             }
>> +             for_each_glob_ref_in(register_good_ref, term_good,
>> +                                  "refs/bisect/", (void *) &good_revs);
>> +
>> +             free(term_good);
>
> Doesn't it bother you that you have to write free(term_good) repeatedly?
>
>> +             argv_array_pushl(&rev_argv, "skipped_commits", "refs/bisect/bad", "--not", NULL);
>> +             for (i = 0; i < good_revs.nr; i++)
>> +                     argv_array_push(&rev_argv, good_revs.items[i].string);
>> +
>> +             /* It is important to reset the flags used by revision walks
>> +              * as the previous call to bisect_next_all() in turn
>> +              * setups a revision walk.
>> +              */
>> +             reset_revision_walk();
>> +             init_revisions(&revs, NULL);
>> +             rev_argv.argc = setup_revisions(rev_argv.argc, rev_argv.argv, &revs, NULL);
>> +             argv_array_clear(&rev_argv);
>> +             string_list_clear(&good_revs, 0);
>
> Are you sure that the revision walking machinery is prepared to see
> the argv[] and elements in it you have given to setup_revisions()
> gets cleared before it starts doing the real work in
> prepare_revision_walk()?  I am reasonably sure that the machinery
> borrows strings like pathspec elements from the caller and expects
> them to stay during revision traversal.

I have extremely little understanding of revision walking. All what I
know about revision walking is what's covered in
Documentation/technical/api-revision-walking.txt . There it is
mentioned that for multiple revision walking its necessary to reset.
What's happening is that there are multiple calls for revision
walking. Now bisect_next() in itself calls for revision walking two
times (the first one being with bisect_next_all and the next one
conditionally with skip).  This is also the reason why I included a
reset_revision_walk() in bisect.c too in this patch. Before what used
to happen is that when git plumbing commands were called there was no
previous need for resetting the revision walk.  Now that only a C code
exists after bisect_replay() conversion (I faced a problem in a
futuristic patch and thought it would be better to cover it up here),
there would be multiple calls to bisect_next(). Previously it wasn't a
problem because bisect_next_all() was a subcommand called from shell
script and for "skip" it was git-rev-list which did revision walking
as a separate thing. I didn't do reset previously and because of this
I used to get a NULL value in the revs.commits . It would be really
helpful if you could look more into it.

> You seem to have acquired a habit of freeing and clearing things
> early, which is bad.  Instead, make it a habit of free/clear at the
> end, and make sure all error paths go through that central freeing
> site.  That tends to future-proof your code better from leaking even
> when later other people change it.

Sure!

>> +             if (prepare_revision_walk(&revs)) {
>> +                     die(_("revision walk setup failed\n"));
>> +             }
>
> This one is still allowed to die, without clean_state?

I couldn't really test this part of code because I don't know how to
make this call fail. And this isn't covered by the test suite either.
BTW, here also it should be return error().

>> +             while ((commit = get_revision(&revs)) != NULL) {
>> +                     struct strbuf commit_name = STRBUF_INIT;
>> +                     format_commit_message(commit, "%s",
>> +                                           &commit_name, &pp);
>> +                     fprintf(fp, "# possible first %s commit: "
>> +                                 "[%s] %s\n", terms->term_bad.buf,
>> +                                 oid_to_hex(&commit->object.oid),
>> +                                 commit_name.buf);
>> +                     strbuf_release(&commit_name);
>> +             }
>> +
>> +             fclose(fp);
>> +             return res;
>> +     }
>> +
>> +     return res;
>> +}
>
>> @@ -415,47 +541,67 @@ static int bisect_start(struct bisect_terms *terms, int no_checkout,
>>               no_checkout = 1;
>>
>>       for (i = 0; i < argc; i++) {
>> -             if (!strcmp(argv[i], "--")) {
>> +             const char *arg;
>> +             if (starts_with(argv[i], "'"))
>> +                     arg = sq_dequote(xstrdup(argv[i]));
>> +             else
>> +                     arg = argv[i];
>> +             if (!strcmp(arg, "--")) {
>>                       has_double_dash = 1;
>>                       break;
>>               }
>>       }
>
> This is really bad; you are blindly assuming that anything that
> begins with " ' " begins with " ' " because "git-bisect.sh" sq-quoted
> and it never directly came from the command line that _wanted_ to
> give you something that begins with a " ' ".
>
> I suspect that you should be able to dequote on the calling script
> side.  Then all these ugliness would disappear.

I had an unsuccessful attempt at it. Though this ugliness is removed
in the patch[2]. This is specifically because of the line in
bisect_replay() shell function which calls bisect_start().

>>       for (i = 0; i < argc; i++) {
>> -             const char *commit_id = xstrfmt("%s^{commit}", argv[i]);
>> +             const char *arg, *commit_id;
>> +             if (starts_with(argv[i], "'"))
>> +                     arg = sq_dequote(xstrdup(argv[i]));
>> +             else
>> +                     arg = argv[i];
>
> Likewise.
>
>> +             commit_id = xstrfmt("%s^{commit}", arg);
>
> In any case, I think a separate "const char *arg" that is an alias
> to argv[i] in the loop is a very good idea to do from the very
> beginning (i.e. should be done in a much much earlier patch in this
> series).

I will do it in the earlier patch.

[1]: http://public-inbox.org/git/01020156b73fe6d7-8b80c663-7c77-469e-811f-40200ec6dbb1-000000@xxxxxxxxxxxxxxxxxxxxxxx/
[2]: https://public-inbox.org/git/01020156b73fe6e4-d45cf1f7-03a3-4566-95d1-73788c5ab2f9-000000@xxxxxxxxxxxxxxxxxxxxxxx/

Regards,
Pranit Bauva
Attachment:
out2

Description: Binary data