Re: [PATCH] submodule: Fetch the direct sha1 first

Junio C Hamano <gitster@xxxxxxxxx> · Fri, 19 Feb 2016 14:29:04 -0800

Stefan Beller <sbeller@xxxxxxxxxx> writes:

> Doing a 'git fetch' only and not the fetch for the specific sha1 would be
> incorrect?

I thought that was what you are attempting to address.

> ('git fetch' with no args finishes successfully, so no fallback is
> triggered. But we are not sure if we obtained the sha1, so we need to
> check if we have the sha1 by doing a local check and then try to get the sha1
> again if we don't have it locally.

Yes, that is what I meant in the "In the opposite fallback order"
suggestion.
>>>                               (clear_local_git_env; cd "$sm_path" &&
>>> +                                     remote_name=$(get_default_remote)
>>>                                       ( (rev=$(git rev-list -n 1 $sha1 --not --all 2>/dev/null) &&
>>> -                                      test -z "$rev") || git-fetch)) ||
>>> +                                      test -z "$rev") || git-fetch $remote_name $rev
>>
>> Regardless of the "fallback order" issue, I do not think $rev is a
>> correct thing to fetch here.  The superproject binds $sha1 to its
>> tree, and you would be checking that out, so shouldn't you be
>> fetching that commit?
>
> Both $sha1 and $rev are in the submodule (because
> 'git submodule--helper list' puts out the sha1 as the
> submodule sha1). $rev is either empty or equal to $sha1
> in my understanding of "rev-list $sha1 --not --all".

Not quite.  The rev-list command expects [*1*] one of three outcomes
in the original construct:

 * The repository does not know anything about $sha1; the command
   fails, rev is left empty, but thanks to &&, git-fetch runs.

 * The repository has $sha1 but the history behind it is not
   complete.  While digging from $sha1 following the parent chain,
   it would hit a missing object and fails, rev may or may not be
   empty, but thanks to &&, git-fetch runs.

 * The repository has $sha1 and its history is all connected.  The
   command succeeds.  If $sha1 is not connected to any of the refs,
   however, that commit may be shown and stored in $rev.  In this
   case, "$rev" happens to be the same as "$sha1".

As this "fetch" is run in order to make sure that the history behind
$sha1 is complete in the submodule repository, so that detaching the
HEAD at that commit will give the user a useful repository and its
working tree, the check the code is doing in the original is already
flawed.  If $sha1 and its ancestry is complete in the repository,
rev-list would succeed, and if $sha1 is ahead of any of the refs,
the original code still runs "git fetch", which is not necessary for
the purpose of detaching the head at $sha1.  On the other hand, by
using "-n 1", it can cause rev-list stop before discovering a gap in
history behind $sha1, allowing "git fetch" to be skipped when it
should be run to fill the gap in the history.

To be complete, the rev-list command line should also run with
"--objects"; after all, a commit walker fetch may have downloaded
commit chain completely but haven't fetched necessary trees and
blobs when it was killed, and "rev-list $sha1 --not --all" would not
catch such a breakage without "--objects".

> Oh! Looking at that I suspect the
> "test -z $(git rev-list -n 1 $sha1 --not --all 2>/dev/null)"
> and "git cat-file -e" are serving the same purpose here and should just
> indicate if the given sha1 is present or not.

That is the simplest explanation why the original "rev-list"
invocation is already wrong.  It should do an equivalent of
builtin/fetch.c::quickfetch() to ensure that $sha1 is something that
is complete, i.e. could be anchored with a ref if we wanted to,
before deciding to avoid running "git fetch".

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html