Hi, On Wed, 26 Feb 2020, Miriam Rubio wrote: > diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c > index f9b04bee23..fedcad4d9b 100644 > --- a/builtin/bisect--helper.c > +++ b/builtin/bisect--helper.c > @@ -833,6 +835,74 @@ static int bisect_autostart(struct bisect_terms *terms) > return BISECT_OK; > } > > +static char *bisect_head(void) > +{ > + if (is_empty_or_missing_file(git_path_bisect_head())) > + return "HEAD"; > + else > + return "BISECT_HEAD"; It is relatively common, and makes it easier to read at least to me, to omit the `else` here: the conditional `return` already leaves the function early. > +} > + > +static enum bisect_error bisect_state(struct bisect_terms *terms, const char **argv, > + int argc) > +{ > + const char *state = argv[0]; > + > + if (check_and_set_terms(terms, state)) > + return BISECT_FAILED; > + > + if (!argc) > + return error(_("Please call `--bisect-state` with at least one argument")); > + > + if (argc == 1 && one_of(state, terms->term_good, > + terms->term_bad, "skip", NULL)) { > + const char *bisected_head = xstrdup(bisect_head()); Do we really want to `strdup()` it? That would leak memory, no? And I don't think that we need this to be allocated anywhere. It's not like we are trying to modify the characters or that we need to take custody of the string: `bisect_head()` will return a static, immutable string. But then, it is not very natural for C code to let `bisect_head()` return a string. Why not return the already-parsed object ID? That would also make that function more robust: instead of expecting `BISECT_HEAD` to be stored in the file `BISECT_HEAD` in the `.git/` directory, you could simply call `get_oid("BISECT_HEAD", &oid)` and if that fails, fall back to do the same with `HEAD`. That way, you tap into the refs machinery and the bisect code would be less dependent on implementation details. > + const char *hex[1]; > + struct object_id oid; > + > + if (get_oid(bisected_head, &oid)) > + return error(_("Bad rev input: %s"), bisected_head); > + if (bisect_write(state, oid_to_hex(&oid), terms, 0)) > + return BISECT_FAILED; > + > + *hex = xstrdup(oid_to_hex(&oid)); > + check_expected_revs(hex, 1); Hmm. Do we expect `check_expected_revs()` to modify what is passed into it? If not, we could probably simplify this a lot (and avoid another memory leak) by something like this: const char *hex; [...] hex = oid_to_hex(&oid); check_expected_revs(&hex, 1); However, taking a step back, I wonder whether it makes sense for `check_expected_revs()` to accept `const char **revs` instead of `struct object_id *oids`. Looking at the definition of `check_expected_revs()`, I would think that it would be better to pass object IDs in, rather than strings: static int is_expected_rev(const char *expected_hex) { struct strbuf actual_hex = STRBUF_INIT; int res = 0; if (strbuf_read_file(&actual_hex, git_path_bisect_expected_rev(), 0) >= 40) { strbuf_trim(&actual_hex); res = !strcmp(actual_hex.buf, expected_hex); } strbuf_release(&actual_hex); return res; } static void check_expected_revs(const char **revs, int rev_nr) { int i; for (i = 0; i < rev_nr; i++) { if (!is_expected_rev(revs[i])) { unlink_or_warn(git_path_bisect_ancestors_ok()); unlink_or_warn(git_path_bisect_expected_rev()); } } } There are a couple of observation that immediately come to my mind: - Reading the bisect_expected_rev file for each rev, only to immediately release it before the next iteration, is wasteful - We can break out of the loop immediately once `is_expected_rev()` returns 0 because we are then unlinking the very file that `is_expected_rev()` checks against. - As I suspected above, a much cleaner interface would use `struct object_id **revs`. - Keeping the code of `is_expected_rev()` separate from `check_expected_revs()` does not make the code more readable for me, but actually less readable instead. - Why are we hard-coding the 40? At this time and age, we've _got_ to use `the_hash_algo->hexsz` instead. In other words, I would do something like this instead: static void check_expected_revs(struct object_id **revs, int rev_nr) { struct object_id expected; struct strbuf buf = STRBUF_INIT; int i; if (!rev_nr) return; if (strbuf_read_file(&buf, git_path_bisect_expected_rev(), 0) < the_hash_algo->hexsz || get_oid_hex(buf.buf, &expected) < 0) return; /* Ignore invalid file contents */ for (i = 0; i < rev_nr; i++) if (!oideq(revs[i], &expected)) { ... unlink ... return; } } However, taking _yet_ another step back, it seems a bit silly to compare many revs (that need to be provided as 40-digit hex strings) to the contents of a file that we expect to contain exactly one 40-digit hex string. Surely, we do not want to take an arbitrary number of revisions, then expect all of them to be identical to the _same_ rev, and otherwise delete that file from where that rev came from? We could even skip reading that file if two or more revisions were passed to `check_expected_revs()` unless _all_ of them are identical! So let's look at the actual caller of this thing: bisect_state() { bisect_autostart state=$1 git bisect--helper --check-and-set-terms $state $TERM_GOOD $TERM_BAD || exit get_terms case "$#,$state" in 0,*) die "Please call 'bisect_state' with at least one argument." ;; 1,"$TERM_BAD"|1,"$TERM_GOOD"|1,skip) bisected_head=$(bisect_head) rev=$(git rev-parse --verify "$bisected_head") || die "$(eval_gettext "Bad rev input: \$bisected_head")" git bisect--helper --bisect-write "$state" "$rev" "$TERM_GOOD" "$TERM_BAD" || exit git bisect--helper --check-expected-revs "$rev" ;; 2,"$TERM_BAD"|*,"$TERM_GOOD"|*,skip) shift hash_list='' for rev in "$@" do sha=$(git rev-parse --verify "$rev^{commit}") || die "$(eval_gettext "Bad rev input: \$rev")" hash_list="$hash_list $sha" done for rev in $hash_list do git bisect--helper --bisect-write "$state" "$rev" "$TERM_GOOD" "$TERM_BAD" || exit done git bisect--helper --check-expected-revs $hash_list ;; *,"$TERM_BAD") die "$(eval_gettext "'git bisect \$TERM_BAD' can take only one argument.")" ;; *) usage ;; esac bisect_auto_next } So there are actually two callers: the first one in the 1,* arm, the second one in the 2,* arm. The first one is easy: it only passes a single rev to `git bisect--helper --check-expected-revs`. The second caller indeed can pass multiple arguments to `check_expected_revs()`, but only if the first argument was not `bad`. But this whole shell function is super ugly, and I do not believe that we're served well by translating it verbatim to even super uglier C code. I propose to instead rewrite it substantially, in the following way: - Handle the `argc == 0` case first thing: in this case, die, complaining that we need at least one argument. (You already did that. Although `state` was already assigned to `argv[0]` at that stage, and I think we don't really want that, just in case that _one_ call chain ends up using `NULL, 0` for `argv, argc`.) - Next, assign `state = argv[0]` and verify that it is one of `good`, `bad` or `skip`, otherwise erroring out. - At this stage, we should do the equivalent of a `shift`: `argv++; argc--;` - Now, we probably need to special-case the `bad` case and verify that `argc <= 1`, otherwise error out. Side note: I do _not_ understand this restriction. Why shouldn't it be valid to pass several revisions to `git bisect bad <revisions>...`? - At this point, if `argc == 0`, we should parse `bisect_head()` into a `struct object_id` and use that as `revs`, setting `argc = 0`. Otherwise, if `argc > 0`, we should populate a `struct object_array` and use that as `revs`. - And finally, we should iterate over these revs and call `bisect_write()` with them, and in the same loop we can also take care of the expected rev. It does not make sense to keep those separate from one another (i.e. to have _two_ loops). That would be a much more natural, and readable, flow, at least to me. > + } > + > + if ((argc == 2 && !strcmp(state, terms->term_bad)) || > + one_of(state, terms->term_good, "skip", NULL)) { > + int i; > + struct string_list hex = STRING_LIST_INIT_DUP; > + > + for (i = 1; i < argc; i++) { > + struct object_id oid; > + > + if (get_oid(argv[i], &oid)) { The original does `git rev-parse --verify "$rev^{commit}"` here, so I think we really want `get_oid_commit()` here, not just `get_oid()`. > + string_list_clear(&hex, 0); > + return error(_("Bad rev input: %s"), argv[i]); > + } > + string_list_append(&hex, oid_to_hex(&oid)); Converting the parsed object ID to hex and storing _that_ as a string and throwing away the parsed object ID looks not very much like C, but much more like shell script. Let's not do that. Let's store the parsed data in a proper `struct object_array`, and only resort to `oid_to_hex()` when we _have_ to. > + } > + for (i = 0; i < hex.nr; i++) { > + const char **hex_string = (const char **) &hex.items[i].string; > + if (bisect_write(state, *hex_string, terms, 0)) { > + string_list_clear(&hex, 0); > + return BISECT_FAILED; > + } > + check_expected_revs(hex_string, 1); > + string_list_clear(&hex, 0); > + return bisect_auto_next(terms, NULL); > + } > + > + if (!strcmp(state, terms->term_bad)) > + return error(_("'git bisect %s' can take only one argument."), > + terms->term_bad); That is an awful late time in the function for a reviewer to read about a condition that should be an early error. > + > + return BISECT_FAILED; > +} > + > int cmd_bisect__helper(int argc, const char **argv, const char *prefix) > { > enum { > [...] > @@ -944,6 +1017,13 @@ int cmd_bisect__helper(int argc, const char **argv, const char *prefix) > set_terms(&terms, "bad", "good"); > res = bisect_autostart(&terms); > break; > + case BISECT_STATE: > + if (!argc) > + return error(_("--bisect-state requires at least one revision")); Wait, we _already_ test for that here? Then we do not need the very same check in `bisect_state()`. I'd rather have it in `bisect_state()`. Besides, we're in a `cmd_*()` function, so we cannot return `error()` because that translates to `-1` and `cmd_*()` functions need to return exit codes, i.e. numbers between 0 and 127. > + set_terms(&terms, "bad", "good"); > + get_terms(&terms); > + res = bisect_state(&terms, argv, argc); > + break; > default: > return error("BUG: unknown subcommand '%d'", cmdmode); Granted, this incorrect `return error()` in a `cmd_*()` function was there before your patch. It is just as incorrect, though. > } > diff --git a/git-bisect.sh b/git-bisect.sh > index 049ffacdff..7f10187055 100755 > --- a/git-bisect.sh > +++ b/git-bisect.sh > [...] > @@ -185,8 +139,7 @@ exit code \$res from '\$command' is < 0 or >= 128" >&2 > state="$TERM_GOOD" > fi > > - # We have to use a subshell because "bisect_state" can exit. > - ( bisect_state $state >"$GIT_DIR/BISECT_RUN" ) > + ( git bisect--helper --bisect-state $state >"$GIT_DIR/BISECT_RUN" ) This subshell looks pointless. Before, we wanted to be able to catch the `die` in `bisect_state`. But since the `bisect--helper` cannot cause the _shell script_ to exit, we can just drop the subshell here. Phew, that was quite a lot, and the review of this patch also took two hours. I'll have to stop here, once again, and hope that I will find time soon to continue the review. I left you quite a bit of work here, but I hope that all of my suggestions are clear enough to act on. If not, please holler. Ciao, Dscho > res=$? > > cat "$GIT_DIR/BISECT_RUN" > @@ -201,7 +154,7 @@ exit code \$res from '\$command' is < 0 or >= 128" >&2 > if [ $res -ne 0 ] > then > eval_gettextln "bisect run failed: > -'bisect_state \$state' exited with error code \$res" >&2 > +'git bisect--helper --bisect-state \$state' exited with error code \$res" >&2 > exit $res > fi > > @@ -242,7 +195,7 @@ case "$#" in > start) > git bisect--helper --bisect-start "$@" ;; > bad|good|new|old|"$TERM_BAD"|"$TERM_GOOD") > - bisect_state "$cmd" "$@" ;; > + git bisect--helper --bisect-state "$cmd" "$@" ;; > skip) > bisect_skip "$@" ;; > next) > -- > 2.25.0 > >