Re: [PATCH 12/29] bisect--helper: reimplement `bisect_next` and `bisect_auto_next` shell functions in C

Johannes Schindelin <Johannes.Schindelin@xxxxxx> · Thu, 30 Jan 2020 23:47:09 +0100 (CET)

Hi Miriam,

I started looking at this patch, and will just send the comments, but
please note that I would not mind at all leaving the review for later,
when the libifying patches that you kept in v2 (and probably will send out
a v3 for) made it into `next` and you send the remainder as a new patch
series.

On Mon, 20 Jan 2020, Miriam Rubio wrote:

> From: Pranit Bauva <pranit.bauva@xxxxxxxxx>
>
> Reimplement the `bisect_next()` and the `bisect_auto_next()` shell functions
> in C and add the subcommands to `git bisect--helper` to call them from
> git-bisect.sh .
>
> Using `--bisect-next` and `--bisect-auto-start` subcommands is a
> temporary measure to port shell function to C so as to use the existing
> test suite. As more functions are ported, `--bisect-auto-start`
> subcommand will be retired and will be called by some other methods.

This still sounds clear enough.

> Mentored-by: Lars Schneider <larsxschneider@xxxxxxxxx>
> Mentored-by: Christian Couder <chriscool@xxxxxxxxxxxxx>
> Mentored-by: Johannes Schindelin <Johannes.Schindelin@xxxxxx>
> Signed-off-by: Pranit Bauva <pranit.bauva@xxxxxxxxx>
> Signed-off-by: Tanushree Tumane <tanushreetumane@xxxxxxxxx>
> Signed-off-by: Miriam Rubio <mirucam@xxxxxxxxx>
> ---
>  bisect.c                 |  10 +++
>  builtin/bisect--helper.c | 174 ++++++++++++++++++++++++++++++++++++++-
>  git-bisect.sh            |  47 ++---------
>  3 files changed, 188 insertions(+), 43 deletions(-)
>
> diff --git a/bisect.c b/bisect.c
> index 33f2829c19..1c13da8a28 100644
> --- a/bisect.c
> +++ b/bisect.c
> @@ -635,6 +635,12 @@ static void bisect_rev_setup(struct repository *r, struct rev_info *revs,
>  	struct argv_array rev_argv = ARGV_ARRAY_INIT;
>  	int i;
>
> +	/*
> +	 * Since the code is slowly being converted to C, there might be
> +	 * instances where the revisions were initialized before. Thus
> +	 * we first need to reset it.
> +	 */

This comment sounds good right now, but is prone to get stale rather
quickly.

But that's actually remedied rather easily: if the comment is reworded
only slightly, to avoid talking about the conversion process, and to
mention instead that `revs` could have been used before, then we're
golden.

> +	reset_revision_walk();
>  	repo_init_revisions(r, revs, prefix);
>  	revs->abbrev = 0;
>  	revs->commit_format = CMIT_FMT_UNSPECIFIED;
> @@ -971,6 +977,10 @@ void read_bisect_terms(const char **read_bad, const char **read_good)
>   * finished successfully.
>   * In this case the calling function or command should not turn a -10
>   * return code into an error or a non zero exit code.

I'd like to have an empty line here (well, a line that only contains an
indented `*`).

> + * This returned -10 is checked in bisect_helper::bisect_next() and
> + * eventually transformed to 0 at the end of
> + * bisect_helper::cmd_bisect__helper().

This says _what_ it does. But not why. I would contend that it is much
more important to know what role the `-10` serves than explaining where
the role is acted out.

> + *
>   * If no_checkout is non-zero, the bisection process does not
>   * checkout the trial commit but instead simply updates BISECT_HEAD.
>   */
> diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c
> index 5e0f759d50..29bbc1573b 100644
> --- a/builtin/bisect--helper.c
> +++ b/builtin/bisect--helper.c
> @@ -8,6 +8,7 @@
>  #include "run-command.h"
>  #include "prompt.h"
>  #include "quote.h"
> +#include "revision.h"
>
>  static GIT_PATH_FUNC(git_path_bisect_terms, "BISECT_TERMS")
>  static GIT_PATH_FUNC(git_path_bisect_expected_rev, "BISECT_EXPECTED_REV")
> @@ -29,6 +30,8 @@ static const char * const git_bisect_helper_usage[] = {
>  	N_("git bisect--helper --bisect-terms [--term-good | --term-old | --term-bad | --term-new]"),
>  	N_("git bisect--helper --bisect-start [--term-{old,good}=<term> --term-{new,bad}=<term>]"
>  					     "[--no-checkout] [<bad> [<good>...]] [--] [<paths>...]"),
> +	N_("git bisect--helper --bisect-next"),
> +	N_("git bisect--helper --bisect-auto-next"),
>  	NULL
>  };
>
> @@ -421,6 +424,157 @@ static int bisect_append_log_quoted(const char **argv)
>  	return res;
>  }
>
> +static int register_good_ref(const char *refname,
> +			     const struct object_id *oid, int flags,
> +			     void *cb_data)
> +{
> +	struct string_list *good_refs = cb_data;
> +	string_list_append(good_refs, oid_to_hex(oid));
> +	return 0;
> +}
> +
> +static void prepare_rev_argv(struct bisect_terms *terms, struct argv_array *rev_argv)
> +{
> +	struct string_list good_revs = STRING_LIST_INIT_DUP;
> +	char *term_good = xstrfmt("%s-*", terms->term_good);
> +
> +	for_each_glob_ref_in(register_good_ref, term_good,
> +			     "refs/bisect/", &good_revs);
> +
> +	argv_array_pushl(rev_argv, "skipped_commits", "refs/bisect/bad", "--not", NULL);
> +	for (int i = 0; i < good_revs.nr; i++)
> +		argv_array_push(rev_argv, good_revs.items[i].string);
> +
> +	string_list_clear(&good_revs, 0);
> +	free(term_good);
> +}

Maybe we should fold that into `prepare_revs()`? We could then render the
arguments directly into `revs` (via `add_pending_object()`, after setting
obj->flags |= UNINTERESTING`) rather than formatting them into a string
list, then deep-copy them into an `argv_array` only to parse them back
into OIDs that we already had in the first place.

> +
> +static int prepare_revs(struct bisect_terms *terms, struct rev_info *revs)
> +{
> +	int res = 0;
> +	struct argv_array rev_argv = ARGV_ARRAY_INIT;
> +
> +	prepare_rev_argv(terms, &rev_argv);
> +
> +	/*
> +	 * It is important to reset the flags used by revision walks
> +	 * as the previous call to bisect_next_all() in turn
> +	 * setups a revision walk.
> +	 */
> +	reset_revision_walk();
> +	init_revisions(revs, NULL);
> +	rev_argv.argc = setup_revisions(rev_argv.argc, rev_argv.argv, revs, NULL);
> +	if (prepare_revision_walk(revs))
> +		res = error(_("revision walk setup failed\n"));
> +
> +	argv_array_clear(&rev_argv);
> +	return res;
> +}
> +
> +static int process_skipped_commits(FILE *fp, struct bisect_terms *terms, struct rev_info *revs)
> +{
> +	struct commit *commit;
> +	struct pretty_print_context pp = {0};
> +
> +	if (fprintf(fp, "# only skipped commits left to test\n") < 1)
> +		return -1;
> +
> +	while ((commit = get_revision(revs)) != NULL) {
> +		struct strbuf commit_name = STRBUF_INIT;
> +		format_commit_message(commit, "%s",
> +				      &commit_name, &pp);
> +		fprintf(fp, "# possible first %s commit: [%s] %s\n",
> +			terms->term_bad, oid_to_hex(&commit->object.oid),
> +			commit_name.buf);
> +		strbuf_release(&commit_name);
> +	}

In the interest of allowing further revision walks, we will probably need
to re-set the flags via `clear_commit_marks()`, just like
`check_ancestors()` does.

> +
> +	return 0;
> +}
> +
> +static int bisect_skipped_commits(struct bisect_terms *terms)
> +{
> +	int res = 0;
> +	FILE *fp = NULL;
> +	struct rev_info revs;
> +
> +	fp = fopen(git_path_bisect_log(), "a");
> +	if (!fp)
> +		return error_errno(_("could not open '%s' for appending"),
> +				  git_path_bisect_log());
> +
> +	res = prepare_revs(terms, &revs);
> +
> +	if (!res)
> +		res = process_skipped_commits(fp, terms, &revs);
> +
> +	fclose(fp);
> +	return res;
> +}

This is again a very short wrapper around another function, so it will
probably make sense to merge the two, otherwise the boilerplate might very
well outweigh the actual code doing actual work.

> +
> +static int bisect_successful(struct bisect_terms *terms)
> +{
> +	FILE *fp = NULL;
> +	struct object_id oid;
> +	struct commit *commit;
> +	struct pretty_print_context pp = {0};
> +	struct strbuf commit_name = STRBUF_INIT;
> +	char *bad_ref = xstrfmt("refs/bisect/%s",
> +				terms->term_bad);
> +	int res = 0;
> +
> +	read_ref(bad_ref, &oid);
> +	printf("%s\n", bad_ref);
> +	commit = lookup_commit_reference(the_repository, &oid);
> +	format_commit_message(commit, "%s", &commit_name, &pp);
> +

There is a trailing tab here. Maybe it would make sense to check the
patches via `git log --check`?

> +	fp = fopen(git_path_bisect_log(), "a");
> +	if (fp) {
> +		if (fprintf(fp, "# first %s commit: [%s] %s\n",
> +			    terms->term_bad, oid_to_hex(&oid),
> +			    commit_name.buf) < 1)
> +			res = -1;

This would probably do with an error message, i.e. `res =
error_errno(...);`

> +		fclose(fp);
> +	} else {
> +		res = error_errno(_("could not open '%s' for "
> +				    "appending"),
> +				  git_path_bisect_log());
> +	}

This pattern of opening a file, writing something into it, and then return
success, otherwise failure, seems like a repeated pattern. In other words,
it would be a good candidate for factoring out into its own function.

> +	strbuf_release(&commit_name);
> +	free(bad_ref);
> +	return res;
> +}
> +
> +static int bisect_next(struct bisect_terms *terms, const char *prefix)
> +{
> +	int res, no_checkout;
> +
> +	if (bisect_next_check(terms, terms->term_good))
> +		return -1;
> +
> +	no_checkout = !is_empty_or_missing_file(git_path_bisect_head());
> +
> +	/* Perform all bisection computation, display and checkout */
> +	res = bisect_next_all(the_repository, prefix, no_checkout);
> +
> +	if (res == -10) {
> +		res = bisect_successful(terms);
> +		return res ? res : -11;
> +	} else if (res == -2) {
> +		res = bisect_skipped_commits(terms);
> +		return res ? res : -2;
> +	}

I know exactly what I'll think if I see those constants six months from
now, when I forgot most of the details of our conversation over here. A
-10 means.. wait, what?

Seriously, it is quite bad to keep those constants unexplained.

In contrast, look at this here code:

	enum scld_error {
		SCLD_OK = 0,
		SCLD_FAILED = -1,
		SCLD_PERMS = -2,
		SCLD_EXISTS = -3,
		SCLD_VANISHED = -4
	};
	enum scld_error safe_create_leading_directories(char *path);

What do you think? Will any reader stumble over this and say "what the
heck is going on? What are these return values even _supposed_ to mean?"?

Even better, it seems as if modern debuggers can figure out that a value
-4 returned from `safe_create_leading_directories()` actually mean
`SCLD_VANISHED` and display that to the user.

So armed with this example, you could of course go back to your mentor and
ask for permission to change the bisect code accordingly.

You could also just decide on your own that this is what you want to do
because it is so much more elegant, anyway.

> +	return res;
> +}
> +
> +static int bisect_auto_next(struct bisect_terms *terms, const char *prefix)
> +{
> +	if (!bisect_next_check(terms, NULL))
> +		return bisect_next(terms, prefix);
> +
> +	return 0;

A common pattern in Git's source code is to present the early return
first, i.e.

	if (bisect_next_check(terms, NULL))
		return 0;

	return bisect_next(terms, prefix);

I do find it easier to read that way, too.

> +
>  static int bisect_start(struct bisect_terms *terms, int no_checkout,
>  			const char **argv, int argc)
>  {
> @@ -625,7 +779,9 @@ int cmd_bisect__helper(int argc, const char **argv, const char *prefix)
>  		CHECK_AND_SET_TERMS,
>  		BISECT_NEXT_CHECK,
>  		BISECT_TERMS,
> -		BISECT_START
> +		BISECT_START,
> +		BISECT_NEXT,
> +		BISECT_AUTO_NEXT,
>  	} cmdmode = 0;
>  	int no_checkout = 0, res = 0, nolog = 0;
>  	struct option options[] = {
> @@ -649,6 +805,10 @@ int cmd_bisect__helper(int argc, const char **argv, const char *prefix)
>  			 N_("print out the bisect terms"), BISECT_TERMS),
>  		OPT_CMDMODE(0, "bisect-start", &cmdmode,
>  			 N_("start the bisect session"), BISECT_START),
> +		OPT_CMDMODE(0, "bisect-next", &cmdmode,
> +			 N_("find the next bisection commit"), BISECT_NEXT),
> +		OPT_CMDMODE(0, "bisect-auto-next", &cmdmode,
> +			 N_("verify the next bisection state then checkout the next bisection commit"), BISECT_AUTO_NEXT),
>  		OPT_BOOL(0, "no-checkout", &no_checkout,
>  			 N_("update BISECT_HEAD instead of checking out the current commit")),
>  		OPT_BOOL(0, "no-log", &nolog,
> @@ -710,6 +870,18 @@ int cmd_bisect__helper(int argc, const char **argv, const char *prefix)
>  		set_terms(&terms, "bad", "good");
>  		res = bisect_start(&terms, no_checkout, argv, argc);
>  		break;
> +	case BISECT_NEXT:
> +		if (argc)
> +			return error(_("--bisect-next requires 0 arguments"));
> +		get_terms(&terms);
> +		res = bisect_next(&terms, prefix);
> +		break;
> +	case BISECT_AUTO_NEXT:
> +		if (argc)
> +			return error(_("--bisect-auto-next requires 0 arguments"));
> +		get_terms(&terms);
> +		res = bisect_auto_next(&terms, prefix);
> +		break;
>  	default:
>  		return error("BUG: unknown subcommand '%d'", cmdmode);
>  	}
> diff --git a/git-bisect.sh b/git-bisect.sh
> index efee12b8b1..7531b74708 100755
> --- a/git-bisect.sh
> +++ b/git-bisect.sh
> @@ -87,7 +87,7 @@ bisect_start() {
>  	# Check if we can proceed to the next bisect state.
>  	#
>  	get_terms
> -	bisect_auto_next
> +	git bisect--helper --bisect-auto-next || exit
>
>  	trap '-' 0
>  }
> @@ -140,45 +140,7 @@ bisect_state() {
>  	*)
>  		usage ;;
>  	esac
> -	bisect_auto_next
> -}
> -
> -bisect_auto_next() {
> -	git bisect--helper --bisect-next-check $TERM_GOOD $TERM_BAD && bisect_next || :
> -}
> -
> -bisect_next() {
> -	case "$#" in 0) ;; *) usage ;; esac
> -	bisect_autostart
> -	git bisect--helper --bisect-next-check $TERM_GOOD $TERM_BAD $TERM_GOOD|| exit
> -
> -	# Perform all bisection computation, display and checkout
> -	git bisect--helper --next-all $(test -f "$GIT_DIR/BISECT_HEAD" && echo --no-checkout)
> -	res=$?
> -
> -	# Check if we should exit because bisection is finished
> -	if test $res -eq 10
> -	then
> -		bad_rev=$(git show-ref --hash --verify refs/bisect/$TERM_BAD)
> -		bad_commit=$(git show-branch $bad_rev)
> -		echo "# first $TERM_BAD commit: $bad_commit" >>"$GIT_DIR/BISECT_LOG"
> -		exit 0
> -	elif test $res -eq 2
> -	then
> -		echo "# only skipped commits left to test" >>"$GIT_DIR/BISECT_LOG"
> -		good_revs=$(git for-each-ref --format="%(objectname)" "refs/bisect/$TERM_GOOD-*")
> -		for skipped in $(git rev-list refs/bisect/$TERM_BAD --not $good_revs)
> -		do
> -			skipped_commit=$(git show-branch $skipped)
> -			echo "# possible first $TERM_BAD commit: $skipped_commit" >>"$GIT_DIR/BISECT_LOG"
> -		done
> -		exit $res
> -	fi
> -
> -	# Check for an error in the bisection process
> -	test $res -ne 0 && exit $res
> -
> -	return 0
> +	git bisect--helper --bisect-auto-next

Beautiful.

>  }
>
>  bisect_visualize() {
> @@ -232,7 +194,7 @@ bisect_replay () {
>  			die "$(gettext "?? what are you talking about?")" ;;
>  		esac
>  	done <"$file"
> -	bisect_auto_next
> +	git bisect--helper --bisect-auto-next
>  }
>
>  bisect_run () {
> @@ -329,7 +291,8 @@ case "$#" in
>  		bisect_skip "$@" ;;
>  	next)
>  		# Not sure we want "next" at the UI level anymore.
> -		bisect_next "$@" ;;
> +		get_terms

I vaguely remember that we talked about this, or at least about a similar
scenario. It needs to be explained in the commit message why we need to
call `get_terms` here when previously, we did not.

Of course, after thinking about this and looking around for a couple of
minutes, I know why. My point is that I, or for that matter, any reader of
this commit, should not need to repeat that analysis.

Other than that, the patch looks good.

As I said, I will stop reviewing the remainder of this patch series, as it
has been removed from v2 and will probably be presented as a follow-up
patch series soon.

Thanks,
Dscho

> +		git bisect--helper --bisect-next "$@" || exit ;;
>  	visualize|view)
>  		bisect_visualize "$@" ;;
>  	reset)
> --
> 2.21.1 (Apple Git-122.3)
>
>