Re: [RFC/PATCH] pull: set-upstream implementation

Junio C Hamano <gitster@xxxxxxxxx> · Wed, 25 May 2016 11:09:54 -0700

Erwan Mathoniere <erwan.mathoniere@xxxxxxxxxxxxxxxx> writes:

> Subject: Re: [RFC/PATCH] pull: set-upstream implementation

If this were multi-patch series and one of the other patches were
"pull: set-upstream design" or something, then it might have been
understandable, but otherwise the word "implementation" sits rather
strangely in the title of this patch.

> Implements pull [--set-upstream | -u] which sets the remote tracking
> branch to the one the user just pulled from.
>
> 	git pull [--set-upstream | -u] <remote> <branch>
>
> After successfully fetched from <remote>, sets branch.<name>.remote to
> <remote> and branch.<name>.merge to follow <remote>/<branch>
>
> Signed-off-by: Erwan Mathoniere <erwan.mathoniere@xxxxxxxxxxxxxxxx>
> Signed-off-by: Jordan De Gea <jordan.de-gea@xxxxxxxxxxxxxxxx>
> Signed-off-by: Matthieu Moy <matthieu.moy@xxxxxxxxxxxxxxx>
> ---
> Hello,
>
> `git push --set-upstream <remote> <branch>` sets the remote and merge config
> variables for <branch> after successfully pushed.
>
> This patch implements an equivalent for `git pull` or
> `git branch --set-upstream-to=<remote>/<branch>`.
>
> Difficulties:
> 	- I can't figure out what remote branch sould be tracked
> 	in that case: `git pull -u origin :master`

What does the command do without "-u"?

> 	- Is the result of 'resolve_ref_unsafe' should be freed ?

Check its function signature--it returns "const char *" which is a
sign that the memory does not belong to you (i.e. the caller), and
you should never free it.

>  Documentation/git-pull.txt |  7 +++++
>  builtin/pull.c             | 61 +++++++++++++++++++++++++++++++++++++
>  t/t5544-pull-upstream.sh   | 75 ++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 143 insertions(+)
>  create mode 100755 t/t5544-pull-upstream.sh
>
> diff --git a/Documentation/git-pull.txt b/Documentation/git-pull.txt
> index d033b25..3a2e0b7 100644
> --- a/Documentation/git-pull.txt
> +++ b/Documentation/git-pull.txt
> @@ -93,6 +93,13 @@ OPTIONS
>  	has to be called afterwards to bring the work tree up to date with the
>  	merge result.
>  
> +-u::
> +--set-upstream::
> +	For each branch that is successfully pulled, add
> +	upstream (tracking) reference, used by argument-less
> +	linkgit:git-push[1] and other commands. For more information,
> +	see 'branch.<name>.merge' in linkgit:git-config[1].

To me, "For each branch" hints that I could do this:

	git pull --set-upstream git://git.kernel.org/r/e/p/o foo bar

and the command would do something to 'foo' and 'bar'.

But I suspect that is not what is going on and that is not what you
wanted to achieve.

A crucial piece of information that is lacking in the above is where
<name> comes from.  The same issue exists in your proposed commit
log message.

I think that you meant to add a feature to add branch.<name>.remote
and branch.<name>.merge configuration variables for the current
branch whose name is <name>, and the values to be recorded for these
two configuration variables are the same as those given on the
command line.  For example "git checkout -b topic master && git pull
origin that" would set "branch.topic.remote" and
"branch.topic.merge" to "origin" and "that", respectively.

Without explaining what <name> is, "For more information" that
refers to 'branch.<name>.merge' does not help the readers much.

	Side note.  It is an understandable mistake.  After one
	spent a lot of effort designing, implementing and debugging
	a feature, by the time one describes what it does, some
	assumptions one made earlier becomes so fundamental in one's
	mind that one forgets that it is not obvious to others.

There a few design decisions you must have made that needs to be
either described fully or at least hinted here, too, such as (not
exhaustive and in random order):

 * What should happen when the current branch already has these two
   configuration variables defined?  Should the user get a warning
   when this changes the setting from what was already there?

 * What should happen when the remote is specified as a Git URL, not
   as a remote nickname?  You want a nickname for the value to place
   in "branch.<name>.remote".

    - Should Git find an existing remote that points at the URL and
      use it?  If so, and if the value we are about to place in
      "branch.<name>.merge" is outside its configured fetch refspec,
      should Git tweak the fetch refspec of the existing remote?

    - Should Git create a new remote nickname for the URL?  If so,
      what should the fetch refspec be for the newly created remote?
      Should it fetch only the branch we fetched just now?

 * What should happen when more than one ref is given?
   branch_get_upstream() can return only one ref; should Git choose
   one of them at random?  Should Git warn and turn this into a
   no-op?

 * What should happen when the ref given to pull is not a branch
   (e.g. a tag)?  A tag, meant to be a stable anchoring point, is
   not something suitable to be set as branch.<name>.merge, even
   though it might be technically possible to configure it as such.
   Should Git warn and turn this into a no-op?

> diff --git a/builtin/pull.c b/builtin/pull.c
> index 1d7333c..e096858 100644
> --- a/builtin/pull.c
> +++ b/builtin/pull.c
> @@ -829,6 +836,57 @@ static int run_rebase(const unsigned char *curr_head,
>  	return ret;
>  }
>  
> +static int set_pull_upstream(const char *repo, const char **refspecs_name)
> +{
> +	unsigned char sha1[GIT_SHA1_RAWSZ];
> +	const char *local_branch, *remote_branch;
> +	struct strbuf remote_name;
> +	struct refspec *refspecs;

Style.  Name a pointer that points at the first element of an array
as singular, so that "element[4]", not "elements[4]", becomes the
way to refer to its fourth element.

> +	int nr_refspec, i;
> +
> +	if (repo == NULL) {
> +		warning(N_("no remote was specified"));
> +		return 1;
> +	}
> +
> +	for (nr_refspec = 0; refspecs_name[nr_refspec] != NULL; nr_refspec++);

Style.  Give an empty statement its own line to make it stand out,
i.e.

	for (....)
        	; /* just counting */

> +	if (nr_refspec == 0) {
> +		warning(N_("no refspec was specified"));
> +		return 1;
> +	}

OK, this is "git pull -u <REMOTE>" without any explicit branch name;
the pull may have already used configured one (or it may have fetched
nothing); there is nothing to do here either way.

> +	refspecs = parse_fetch_refspec(nr_refspec, refspecs_name);
> +
> +	for (i = 0; i < nr_refspec; i++) {
> +		if (refspecs[i].src != NULL && strlen(refspecs[i].src) > 0) {

Style.  Lose the " != NULL" and useless strlen().  I.e.

	if (refspec[i].src && *refspec[i].src)

More importantly, what would you see here in .src when your command
line were:

	$ git pull -u <REMOTE> "refs/heads/*:refs/remotes/foo/*"

Hint: check .pattern to make sure it is false.  Skip them at the top
of the loop body, perhaps.

> +			remote_branch = refspecs[i].src;
> +		} else {
> +			warning(N_("not yet implemented"));

What do you plan to implement here?

> +			continue;
> +		}
> +
> +		if (refspecs[i].dst != NULL && strlen(refspecs[i].dst) > 0) {
> +			local_branch = refspecs[i].dst;
> +		} else {
> +			// TODO : Should it be freed ?

Style.  No "// comment".

> +			local_branch = resolve_ref_unsafe("HEAD", 0, sha1, NULL);
> +			skip_prefix(local_branch, "refs/heads/", &local_branch);

What happens here if your user did this?

	$ git checkout HEAD^0 && git pull -u <REMOTE> ...

You do not have "local_branch" here.  Check the condition, warn and
turn it into no-op (I do not think "pull -u" is important enough to
fail the entire command).

> +		}
> +
> +		strbuf_init(&remote_name, strlen(repo) + strlen(remote_branch) + 1);
> +		strbuf_addf(&remote_name, "%s/%s", repo, remote_branch);

What happens here if your user did this?

	$ git pull -u git://git.kernel.org/r/e/p/o master

Specifically, what is "repo" at this point?

> +		create_branch(local_branch, local_branch, remote_name.buf, 0, 0, 0, opt_verbosity < 0, BRANCH_TRACK_OVERRIDE);

I thought you were only setting up configurations for existing
branch.  Why do you call create_branch() on the existing
local_branch, which is supposed to be the current one?

Perhaps create_branch() API is too crappy and it needs to be
properly refactored in preparation for this patch.  I dunno.

What does it mean for this loop to execute more than once, flipping
the configuration for the current branch number of times?  The last
one wins?  Does it even make sense from the end-user's point of
view?

> +
> +		strbuf_release(&remote_name);
> +	}
> +
> +	free_refspec(nr_refspec, refspecs);
> +
> +	return 0;
> +}
> +
>  int cmd_pull(int argc, const char **argv, const char *prefix)
>  {
>  	const char *repo, **refspecs;
> @@ -884,6 +942,9 @@ int cmd_pull(int argc, const char **argv, const char *prefix)
>  	if (opt_dry_run)
>  		return 0;
>  
> +	if (opt_set_upstream)
> +		set_pull_upstream(repo, refspecs);
> +
>  	if (get_sha1("HEAD", curr_head))
>  		hashclr(curr_head);
>  
> diff --git a/t/t5544-pull-upstream.sh b/t/t5544-pull-upstream.sh
> new file mode 100755
> index 0000000..38eb51d
> --- /dev/null
> +++ b/t/t5544-pull-upstream.sh
> @@ -0,0 +1,75 @@
> +#!/bin/sh
> +
> +test_description='pull with --set-upstream'
> +. ./test-lib.sh
> +. "$TEST_DIRECTORY"/lib-terminal.sh
> +
> +ensure_fresh_upstream() {

Style:

	ensure_fresh_upstream () {

> +	rm -rf parent &&
> +	git init parent &&
> +	cd parent &&
> +	(
> +		echo content >file &&
> +		git add file &&
> +		git commit -m one &&
> +		git checkout -b other &&
> +		echo content >file2 &&
> +		git add file2 &&
> +		git commit -m two &&
> +		git checkout -b master2 &&
> +		git checkout master
> +	) &&
> +	cd ..

The tests typically do

	... some stuff ... &&
        (
		cd elsewhere &&
                ... more stuff ...
	) &&
        ... yet more stuff ...

to make sure that even after any of "... more stuff ..." fails, the
caller of this sequence will find it in an expected and stable
place.  As written in your patch, if any of the "... more stuff ..."
fails, the caller will be in "parent" directory, not in the original
directory, because your "cd .." will not be exectued.

I wonder if you are completely missing the point of using a subshell
here?

> +}
> +
> +test_expect_success 'setup bare parent' '
> +	ensure_fresh_upstream &&
> +	git remote add upstream parent &&
> +	git pull upstream master
> +'
> +
> +check_config() {
> +	(echo $2; echo $3) >expect.$1

	test_write_lines "$2" "$3" >"expect.$1"

Unless you want your variable reference is split at $IFS, quote
your variables.

> +	(git config branch.$1.remote
> +	 git config branch.$1.merge) >actual.$1
> +	test_cmp expect.$1 actual.$1

It is not _wrong_ per-se, but I do not think ".$1" suffix is adding
any value.  I'd drop it if I were doing this patch.

> +}
> +
> +test_expect_success 'pull -u master' '
> +	git pull -u upstream master &&
> +	check_config master upstream refs/heads/master
> +'
> +
> +test_expect_success 'pull -u master:other' '
> +	git pull -u upstream master:other &&
> +	check_config other upstream refs/heads/master
> +'
> +
> +test_expect_success 'pull -u --dry-run other:other' '
> +	git pull -u --dry-run upstream other:other &&
> +	check_config other upstream refs/heads/master
> +'
> +
> +test_expect_success 'pull -u master2:master2 master:other' '
> +	git branch master2 &&
> +	git pull -u upstream master2:master2 master:other &&
> +	check_config master2 upstream refs/heads/master2 &&
> +	check_config other upstream refs/heads/master
> +'
> +
> +test_expect_success 'pull -u HEAD' '
> +	git clone parent son &&
> +	cd son &&
> +	git checkout -b headbranch &&
> +	git pull -u origin HEAD &&
> +	check_config headbranch origin refs/heads/master

Do not "chdir" outside a subshell.

I think the next test will not work if this test, specifically the
first two steps that creates a new repository "son" and moves into
it, fails.  Don't create such an interdependency between tests when
you can avoid it.

Hint. Immediately after creating "parent", in the same test, clone
it into "son".  Everybody needs to depend on that "setup bare
parent" test anyway, so that is not making things worse.  And then
update this test like so:

	test_expect_success 'title' '
        	(
                	cd son &&
                        git checkout -b headbranch &&
                        ...
		)
	'

making sure that outside test_expect_success block, the process will
stay at the original directory, no matter which tests fail or get
skipped.

> +'
> +
> +test_expect_success TTY 'quiet pull' '
> +	ensure_fresh_upstream &&
> +
> +	test_terminal git pull -u --quiet upstream master 2>&1 | tee output &&
> +	test_cmp /dev/null output
> +'

There is no test that makes sure that the new feature does not kick
in when it should not.  E.g. pulling from somewhere that is not a
configured remote.  For example, 

> +test_done
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html