Re: Pull without fetch

Junio C Hamano <gitster@xxxxxxxxx> · Tue, 09 Apr 2019 17:03:12 +0900

Damien Robert <damien.olivier.robert@xxxxxxxxx> writes:

> In the good old days `git pull` was essentially `git fetch` followed by
> `git merge FETCH_HEAD` so it was easy to separate the fetching part from
> the integrating part.
>
> But nowadays there are no easy way to do the integrating part of `git
> pull`. Indeed, `git pull` learnt to read the config values of
> `branch.<current>.rebase` and `pull.rebase`, whose possible values are
> growing. At the time of this commit, they can be 'true', 'merges',
> 'preserve' and 'interactive'.
>
> To remedy this, add a new --no-fetch option to `git pull`.

It may be better to present the whole remedy, not just this new
piece that is an ingredient of it, i.e.

	Teach a new --no-fetch option to `git pull`, so that the old
	`git fetch && inspect && git merge FETCH_HEAD` sequence can
	be generalized for workflows with newer `pull` features as a
	`git fetch && inspect && git pull --no-fetch` sequence.

or something like that, perhaps?

What worries me about this topic (before even reading the patch
text) revolves around how we guarantee that the "inspect" step in
that sequence does not affect the environment negatively.  In order
for the three-step sequence (no other options to `fetch` or `pull`
steps) to be truly equivalent to `git pull` (no other options) when
inspection is satisfactory, the "inspect" step probably should not
change the currently checked out branch; an unsplit `git pull` would
base its decision to choose which branch(es) from what remote to
fetch from based on what the currently checked-out branch is [*1*],
and it is not too implausible for `git fetch` to learn the same
trick, even though currently a `git fetch` without options seem to
always fetch from the default remote, without taking the current
branch or any other enviornmental conditions into account.  In the
worst case, after "inspect" step, `git pull --no-fetch` may not even
find a useful FETCH_HEAD; this includes a case like `git pull
--no-fetch some_other_remote` is run that is totally unrelated to
the previous `git fetch` that left the FETCH_HEAD.  The code needs
to notice strangeness like that and deal with it (most of the time,
I think erroring out saying "you did something stupid/unexpected
since you ran `git fetch`, so --no-fetch refuses to work" is
sufficient).

	Side note *1* There may be things other than what is
	currently checked out that can serve as an input to this
	decision, and moreover, we need to be prepared that other
	Git developers may want to invent other things later after
	this feature gets in that affects how `fetch` works, and
	leave the door open for them to do so without breaking the
	feature we are introducing here.

Anyway, let's see what you wrote.

> diff --git a/Documentation/git-pull.txt b/Documentation/git-pull.txt
> index 118d9d86f7..cec06bf6e3 100644
> --- a/Documentation/git-pull.txt
> +++ b/Documentation/git-pull.txt
> @@ -144,6 +144,15 @@ This option is only valid when "--rebase" is used.
>  Options related to fetching
>  ~~~~~~~~~~~~~~~~~~~~~~~~~~~
>  
> +--fetch::
> +--no-fetch::
> +	Perform a fetch first. This option can be used to override
> +	--no-fetch.
> ++
> +With --no-fetch don't fetch before updating the branch via a merge or a
> +rebase. This can be used to review changes by doing a fetch first before
> +pulling.

Those who were in this discussion thread can guess that this is
referring to

	$ git fetch
          ... inspect ...
	$ git pull --no-fetch         

sequence, that emulates a single "git pull", but those who are
reading this without such a background probably gets lost.  Aren't
there restrictions on the first fetch done and the subsequent "pull"
that does not fetch?  For example, if you have a remote, origin, and
also pull from repositories without a nickname, doing this is not
something you would be willing to support, I presume:

         $ git fetch origin
         $ git pull --no-fetch https://github.com/gitster/git master

The reasons why I suspect you are not supporting the above are (1)
`git fetch origin` won't grab anything from my repository, and nothing
in FETCH_HEAD is useful to help the `pull` step of the sequence, and
(2) `git pull https://github.com/gitster/git master` without a set
nickname would not keep "the last time seen" value in refs/remotes/
hierarchy, so there is nothing to help --no-fetch "pretend" as if it
fetched previously [*2*].

Such a restriction must be spelled out to set the end-user
expectation for this new feature.

	Side note *2* This is about the design decision, but do we
	want this --no-fetch thing useful even for remotes without
	remote-tracking branches?  If we are willing to declare that
	we only support remotes with remote-tracking branches, then
	an alternative and probably a much better design that does
	not cause any of the "a user may do random and stupid things
	while inspecting, affecting the environment" worries I
	mentioned earlier becomes possible.  Instead of handling
	this feature at the "pull" level, teach "git fetch" a new
	option "--no-fetch" that pretends to have fetched from the
	given remote and given refspec, by looking at the tips of
	remote-tracking branches instead of talking to the remote,
	and populate FETCH_HEAD as if we got exactly the same thing
	as we saw when we last talked to the remote.  Then the
	change to "git pull --no-fetch" would be just to pass the
	option through to the underlying "git fetch".  That way, no
	matter what stupid things the user does in the intermediate
	"inspect" step, the condition that would affect `git pull`
	at the last step to decide which branches from what remote
	are used will be the same with or without `--no-fetch`, and
	we do not have to worry about restrictions on the earlier
	`git fetch` step relative to the final `git pull --no-fetch`
	step.  That would make the feature much easier to explain.

Anyway, let's keep reading.

>  /* Options passed to git-fetch */
> +static int opt_fetch = 1;
>  static char *opt_all;
>  static char *opt_append;
>  static char *opt_upload_pack;
> @@ -195,6 +196,8 @@ static struct option pull_options[] = {
>  
>  	/* Options passed to git-fetch */
>  	OPT_GROUP(N_("Options related to fetching")),
> +	OPT_BOOL(0, "fetch", &opt_fetch,
> +		N_("fetch before merging / rebasing (default)")),

Good.  The above two implements a --no-foo that is off by default
correctly.

> @@ -912,8 +915,9 @@ int cmd_pull(int argc, const char **argv, const char *prefix)
>  			oidclr(&rebase_fork_point);
>  	}
>  
> -	if (run_fetch(repo, refspecs))
> -		return 1;
> +	if (opt_fetch)
> +		if (run_fetch(repo, refspecs))
> +			return 1;

I know this is PoC, but the real implementation must have the else
clause to deal with "what if we do not fetch" case.

	if (opt_fetch) {
		if (run_fetch(repo, refspecs))
			return 1;
	} else {
		... make sure what is in FETCH_HEAD is compatible
		... with run_fetch(repo, refspecs) we are skipping
		... would have left, if we didn't skip it.
		... and otherwise barf loudly.
	}

> +test_expect_success 'With --no-fetch will not fetch but still merge pending changes' '
> +	git checkout -b nofetch master^ &&
> +	echo modified >file &&
> +	git commit -a -m modified &&
> +	git checkout copy &&
> +	git reset --hard HEAD^ &&
> +	test "$(cat file)" = file &&
> +	stat --format %Y .git/FETCH_HEAD > fetch_head_before &&

Do we use "stat(1)" elsewhere?  I think we avoid things that are not
portable.

See Documentation/CodingGuidelines for shell scripting style guides.

> +	git pull --no-fetch . nofetch &&
> +	stat --format %Y .git/FETCH_HEAD > fetch_head_after &&

I do not think you want to do this.

In "git fetch && inspect && git pull --no-fetch" sequence, the
file timestamp of FETCH_HEAD is not something you care about.  What
you care about is that what you inspected gets used in the 'pull'
step, and nothing newer or older.

A more to-the-point test arrangement would be something like

	- prepare the source repository
	- git fetch from the source repository
	- go back to the source repository and add a commit
	- run git pull --no-fetch

        - inspect the resulting merge commit.  Is the second parent
          the original commit you grabbed when you fetched?  Or is
          it the new one added to the source repository since you
          fetched?

One obvious advantage of doing so is that it is the exact motivating
use case you gave when you came to the list with this feature.
Also, it allows the "teach `git fetch --no-fetch` to pretend as if
it fetched the same thing, based on the remote-tracking branches"
implementation, by not caring the file timestamp of FETCH_HEAD.