Re: [PATCH] submodule: Fetch the direct sha1 first

Junio C Hamano <gitster@xxxxxxxxx> · Fri, 19 Feb 2016 16:11:55 -0800

Stefan Beller <sbeller@xxxxxxxxxx> writes:

> On Fri, Feb 19, 2016 at 2:29 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
>> Stefan Beller <sbeller@xxxxxxxxxx> writes:
>>
>>> Doing a 'git fetch' only and not the fetch for the specific sha1 would be
>>> incorrect?
>>
>> I thought that was what you are attempting to address.
>
> Yep. In an ideal world I would imagine it would look like
>
>     if $sha1 doesn't exist:
>         fetch $sha1
>         if server did not support fetching direct sha1:
>             fallback to fetch <no args>

It should look more like this:

	if $sha1's history and objects are incomplete:
		fetch ;# normally just like we have done before
                if $sha1's history and objects are still incomplete:
			fetch $sha1

as existing users already expect that commits and objects that are
reachable from tips of refs configured to be fetched in the
submodule via its configured refspecs are available after this part
of the code runs, regardless of this "Gerrit reviews may not have
arrived to branches yet" issue.  The first "normal" fetch ensures
that the expectation is met.

> Would it make sense in case of broken histories to not fetch
> (specially if the user asked to not fetch) and rather repair by
> making it a shallow repository?

Commits whose ancestors, trees and/or blobs are incomplete can and
do exist in a perfectly healthy repository and there is no breakage
in the history as long as such commits are not reachable from any of
the refs.

You can for example make a small fetch from 'pu' today, that results
in unpack-objects to be run instead of index-pack, and then make
another fetch from 'pu', making these loose objects unreachable from
anywhere.  Maybe there were 5 commits worth of objects in the
original transfer, and the objects necessary for the bottom 2 were
pruned away while the tip one still in the repository [*1*].

"cat-file -e" may find that the tip commit is there, but "rev-list
--objects $oldtip --not --all" will find that the old tip of pu that
is left behind is incomplete and cannot be safely used (e.g. "git
log -p" would fail).  The "$sha1's history and objects are
incomplete" check aka "quickfetch()" test is a way to avoid getting
fooled by an object that passes "cat-file -e" test.

I am not sure if it is feasible, given such an island of commits and
associated objects, to craft a proper "shallow" boundary after the
fact.  It should be doable with extra code, but I do not think there
is a canned support at the plumbing level (you can obviously
construct it with "cat-file -e" and following the inter object links
yourself).

This "fetch" is in a cmd_update() codepath, whose purpose is "Update
each submodule path to correct revision, using clone and checkout as
needed", so I am not sure "to not fetch, specically if the user
asked to not fetch" makes much sense in the first place.

[Footnote]

*1* A canonical example used to be a commit walker that fetches from
    the tip of a branch that is ahead of us by 5 commits, gets
    killed after fetching and storing the tip commit object and some
    of its trees and blobs, and before successfully fetching and
    storing all the necessary objects, e.g. the parent commits and
    its trees and blobs.  That would leave a disconnected island of
    objects that are not anchored by any ref.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html