Re: [PATCH 08/15] scalar: implement the `clone` subcommand

Johannes Schindelin <Johannes.Schindelin@xxxxxx> · Fri, 3 Sep 2021 17:20:35 +0200 (CEST)

Hi Junio,

On Wed, 1 Sep 2021, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" <gitgitgadget@xxxxxxxxx>
> writes:
>
> > +static char *remote_default_branch(const char *url)
> > +{
> > +	struct child_process cp = CHILD_PROCESS_INIT;
> > +	struct strbuf out = STRBUF_INIT;
> > +
> > +	cp.git_cmd = 1;
> > +	strvec_pushl(&cp.args, "ls-remote", "--symref", url, "HEAD", NULL);
> > +	strbuf_addstr(&out, "-\n");
>
> Is this a workaround for the problem that the first "ref:" line
> won't be found by looking for "\nref: " in the loop?  Cute, but the
> extra "-" is a bit misleading.

The `-` is actually needed because of the `ref + 1` below, over which you
stumbled.

>
> > +	if (!pipe_command(&cp, NULL, 0, &out, 0, NULL, 0)) {
> > +		char *ref = out.buf;
> > +
> > +		while ((ref = strstr(ref + 1, "\nref: "))) {
> > +			const char *p;
> > +			char *head, *branch;
> > +
> > +			ref += strlen("\nref: ");
> > +			head = strstr(ref, "\tHEAD");
> > +
> > +			if (!head || memchr(ref, '\n', head - ref))
> > +				continue;
>
> OK.  We expect "ref: " <refname> "\t" <head> "\n" where <head> is
> "HEAD" for their .git/HEAD and refs/remotes/<nick>/HEAD for their
> remote-tracking branch for the remote they call <nick>, on a single
> line.  We reject a line that is not of that shape, and we reject a
> line that is about remote-tracking branch by only looking for
> "\tHEAD". Makes sense.
>
> The strstr() goes from "ref + 1", which feels sloppy.

I would use a different adjective, one that is less judgemental in nature,
but then, you were talking about your feelings.

> When we reject the line we found that begins with "ref :", I would have
> expected that the next scan would start at the beginning of the next
> line, not from the middle of this line at the first letter 'e' in
> 'refs/heads/' on the current line "ref: refs/heads/.....".  As long as
> the current line is long enough, strstr() would not miss the beginning
> of the next line, so it might be OK.

It would even work if the current line is shorter, but as you point out:
it is wasteful. And it could be improved to be more readable. I reworked
it, and it now looks like this:

	if (!pipe_command(&cp, NULL, 0, &out, 0, NULL, 0)) {
		const char *line = out.buf;

		while (*line) {
			const char *eol = strchrnul(line, '\n'), *p;
			size_t len = eol - line;
			char *branch;

			if (!skip_prefix(line, "ref: ", &p) ||
			    !strip_suffix_mem(line, &len, "\tHEAD")) {
				line = eol + (*eol == '\n');
				continue;
			}

			eol = line + len;
			if (skip_prefix(p, "refs/heads/", &p)) {
				branch = xstrndup(p, eol - p);
				strbuf_release(&out);
				return branch;
			}

			error(_("remote HEAD is not a branch: '%.*s'"),
			      (int)(eol - p), p);
			strbuf_release(&out);
			return NULL;
		}
	}

It now parses the output line by line, looking for the expected prefix and
suffix, then verifies the ref name format, and either returns the short
branch name or errors out with the message that this is not a branch.

>
> > +			if (skip_prefix(ref, "refs/heads/", &p)) {
> > +				branch = xstrndup(p, head - p);
> > +				strbuf_release(&out);
> > +				return branch;
> > +			}
> > +
> > +			error(_("remote HEAD is not a branch: '%.*s'"),
> > +			      (int)(head - ref), ref);
> > +			strbuf_release(&out);
> > +			return NULL;
>
> OK.  Any symref whose basename is HEAD in their remote-tracking
> hierarchy would have been rejected earlier in the loop.
>
> Is there a particular reason why we return early here, instead of
> breaking out of hte loop and let the generic "failed to get" code
> path below to handle this case?

Yes, the reason is that I wanted to err on the side of caution. If the
remote repository reports a default branch that is not a default branch at
all, I do not want to pretend that things are fine and then run into
trouble later when we set up a non-branch as remote-tracking target or
something like that.

>
> > +		}
> > +	}
> > +	warning(_("failed to get default branch name from remote; "
> > +		  "using local default"));
> > +	strbuf_reset(&out);
> > +
> > +	child_process_init(&cp);
> > +	cp.git_cmd = 1;
> > +	strvec_pushl(&cp.args, "symbolic-ref", "--short", "HEAD", NULL);
> > +	if (!pipe_command(&cp, NULL, 0, &out, 0, NULL, 0)) {
> > +		strbuf_trim(&out);
> > +		return strbuf_detach(&out, NULL);
> > +	}
> > +
> > +	strbuf_release(&out);
> > +	error(_("failed to get default branch name"));
> > +	return NULL;
> > +}
>
> > +static int cmd_clone(int argc, const char **argv)
> > +{
> > +	const char *branch = NULL;
> > +	int full_clone = 0;
> > +	struct option clone_options[] = {
> > +		OPT_STRING('b', "branch", &branch, N_("<branch>"),
> > +			   N_("branch to checkout after clone")),
> > +		OPT_BOOL(0, "full-clone", &full_clone,
> > +			 N_("when cloning, create full working directory")),
> > +		OPT_END(),
> > +	};
> > +	const char * const clone_usage[] = {
> > +		N_("scalar clone [<options>] [--] <repo> [<dir>]"),
> > +		NULL
> > +	};
> > +	const char *url;
> > +	char *enlistment = NULL, *dir = NULL;
> > +	struct strbuf buf = STRBUF_INIT;
> > +	int res;
> > +
> > +	argc = parse_options(argc, argv, NULL, clone_options, clone_usage, 0);
> > +
> > +	if (argc == 2) {
> > +		url = argv[0];
> > +		enlistment = xstrdup(argv[1]);
> > +	} else if (argc == 1) {
> > +		url = argv[0];
> > +
> > +		strbuf_addstr(&buf, url);
> > +		/* Strip trailing slashes, if any */
> > +		while (buf.len > 0 && is_dir_sep(buf.buf[buf.len - 1]))
> > +			strbuf_setlen(&buf, buf.len - 1);
> > +		/* Strip suffix `.git`, if any */
> > +		strbuf_strip_suffix(&buf, ".git");
> > +
> > +		enlistment = find_last_dir_sep(buf.buf);
> > +		if (!enlistment) {
> > +			die(_("cannot deduce worktree name from '%s'"), url);
> > +		}
> > +		enlistment = xstrdup(enlistment + 1);
> > +	} else {
> > +		usage_msg_opt(_("You must specify a repository to clone."),
> > +			      clone_usage, clone_options);
> > +	}
> > +
> > +	if (is_directory(enlistment))
> > +		die(_("directory '%s' exists already"), enlistment);
> > +
> > +	dir = xstrfmt("%s/src", enlistment);
> > +
> > +	strbuf_reset(&buf);
> > +	if (branch)
> > +		strbuf_addf(&buf, "init.defaultBranch=%s", branch);
> > +	else {
> > +		char *b = repo_default_branch_name(the_repository, 1);
> > +		strbuf_addf(&buf, "init.defaultBranch=%s", b);
> > +		free(b);
>
> Doesn't "git clone" already use their HEAD without having to make an
> extra "git ls-remote" roundtrip?
>
> Ahh, you do not do "git clone"; you do "git init", set things up,
> and then "git fetch" and checkout, all manually.
>
> Which is kind of shame.
>
> I wonder if it is a cleaner implementation to give a new option to
> "git clone" that gives a command sequence (not necessarily have to
> be implemented as a shell script) that specifies necessary
> pre-configuration steps to be done before the command starts the
> transfer step.

Right. It is a shame, I agree. And it is one of the things I want to work
on, after the Scalar patch series made it into Git.

The reason why I don't want to work on this now is that I expect this
effort to result in new options for `git clone`, new options that need to
be designed well, and where I fully expect a long discussion until we
reach a consensus how these options should look like, especially since we
will need to maintain backwards-compatibility of Scalar's functionality.

Therefore, in the interest to keep the patch series relatively easy to
review, I left this in the "for later" pile, for now.

Ciao,
Dscho