Re: [RFC] On the --depth argument when fetching with submodules

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Feb 7, 2016 at 5:32 AM, Lars Schneider <larsxschneider@xxxxxxxxx> wrote:
>
> On 06 Feb 2016, at 01:05, Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
>> Stefan Beller <sbeller@xxxxxxxxxx> writes:
>>
>>> Currently when cloning a project, including submodules, the --depth argument
>>> is passed on recursively, i.e. when cloning with "--depth 2", both the
>>> superproject as well as the submodule will have a depth of 2.  It is not
>>> garantueed that the commits as specified by the superproject are included
>>> in these 2 commits of the submodule.
>>>
>>> Illustration:
>>> (superproject with depth 2, so A would have more parents, not shown)
>>>
>>> superproject/master: A <- B
>>>                    /      \
>>> submodule/master:  C <- D <- E <- F <- G
>>>
>>> (Current behavior is to fetch G and F)
>>
>> I think the issue is deeper than merely "--depth 2", and you would
>> be better off stepping back and think about various use cases to
>> make sure that we know what kind of behaviour we want to support
>> before delving into one particular corner case.  We currently pass
>> the depth recursively, and I do not think it makes much sense, but I
>> view it as a secondary question "among the behaviours we want to
>> support, which one should be the default?"  It may turn out that not
>> passing it recursively at all, or even passing a different depth, is
>> a better default, but we wouldn't know until we know what are the
>> desirable behaviour in various workflows.
>>
>> If you are actively working on the superproject plus some submodules
>> but you are merely using the submodule you depicted above, not
>> working on changing it, even when you want the full history of the
>> superproject (i.e. no "--depth 2"), you may not want history of the
>> submodule.  Even though we have a way to say "I am not interested in
>> this submodule AT ALL" by not doing "submodule init", not having
>> anything at all at the path submodule/ may not allow you to build
>> the whole thing, and we currently lack a way to express "I am not
>> interested in the history of this thing, but I need at least the
>> tree that matches the commit referred to by the superproject".
>>
>> If you are working on a single submodule, trying to fix a bug in the
>> context of the whole project, you might want to have a single-depth
>> clone of the superproject and all other submodules, plus the whole
>> history of the single submodule.
>>
>> In either of these examples, the top-level "--depth" does not have
>> much to do with what depth the user wants to use when cloning or
>> fetching the submodule repositories.
>>
>> I have a feeling (but I would not be surprised if somebody who uses
>> submodules heavily has a counter-example from real life) that
>> regardless of "--depth" or full clone, fetching the tip of matching
>> branch is not a good default behaviour.  In your picture, even when
>> depth is not given at all, there isn't much point fetching F or G.
>
> I really wonder in what cases people use the "--depth" option, too.
> For instance I have never used it in either one of the two cases you
> described above. I don't worry about a long running "clone" as it
> usually is a one-time operation.

I think there are 3 use cases.

1) You work on the superproject and don't care about the submodules.
In this case you want the superproject non-shallow and the submodules
may be just fine with depth 1. (Think of libraries pulled in via Git instead
of via the build system)

2) The superproject is a collection of submodules, i.e. not much content
in the superproject except for the submodules. You want to work
in the submodules, i.e. you want the suberproject shallow, and all
submodules deep.

3) same as 2, but you're interested in only one (or a few) submodules,
which means you want superproject and most of the other submodules
shallow, but one submodule needs to be deep.

So covering 1 and 2 is easy, 3 is complicated.
For 1) we can make it so, that the depth argument is not passed on,
but only covers the referenced submodule commits, and then we
introduce another switch "--submodule-depth" to cover 2).

For 3 we don't know which submodules the user is interested in,
so the user needs to unshallow the interesting submodules themselves
after doing a "--depth 1 --submodule--depth 1" clone. "--depth 1" sort of
implies "--submodule--depth 1", though.

>
> However, in case of a continuous integration system that starts with
> a clean state in the beginning of every run (e.g. Travis CI) a
> "clone" operation is no one-time operation anymore. In this case the
> "--depth 1" option makes very much sense to me. This was the situation
> where I realized the problem that Stefan wants to tackle here and I
> tried to make it tangible with a test case [1].

Thanks for the test! The problem to make it work is in making it working
in a backwards compatible way. Instead of the branch, you can just pass
a sha1 to git fetch and it sometimes works (if the server permits fetching
arbitrary or hidden sha1s. Though Jonathan noted this check may be in
the client only and the server trusts the client on not wanting arbitrary
sha1s?)

So for fetching I think we need to have a "--try-to-get-commit <sha1>"
argument for fetch, which depending on the server capabilities and
the history obtained otherwise may try again to fetch the exact sha1.


>
> On top of that I think Git's error message is really confusing if
> you clone a repo with "--depth" that has submodules and Git is not
> fetching the necessary submodule commits:
>
> Unable to checkout '$SHA' in submodule path 'path/to/submodule'
>
> I tried to tackle that with [2] which would detect this case and
> print the following error instead (slightly changed from the patch):
>
> Unable to checkout '$SHA' in submodule path '/path/to/commit'.
> Try to remove the '--depth' argument on clone!
>
> [1] https://www.mail-archive.com/git%40vger.kernel.org/msg82614.html
> [2] https://www.mail-archive.com/git%40vger.kernel.org/msg82613.html
>
>
>>
>>> So to fetch the correct submodule commits, we need to
>>> * traverse the superproject and list all submodule commits.
>>> * fetch these submodule commits (C and E) by sha1
>>
>> I do not think requiring that C to be fetched when the superproject
>> is cloned with --depth=2 (hence A and B are present in the result)
>> is a good definition of "correct submodule commits".  The initial
>> clone could be "superproject follows --depth, all submodules are
>> cloned with --depth=1 at the commits referenced by the superproject
>> tree"--by that definition, you need E but you do not want C.
>>
>> As a specification of the behaviour, the above two might work, but I
>> do not think that should be the implementation.  In other words,
>> "The implementation should behave as if it did the above two" is OK,
>> and it is also OK to qualify with further conditions to help the
>> implementation.  For example, the current structure assumes that E
>> and C are reachable from "some" ref in submodule, so that at least a
>> whole clone of the submodule would give them to you--otherwise you
>> would not be able to even build the superproject at A or B.  Perhaps
>> it is OK to further require that, when you are working in a single
>> branch mode and working on 'master', you are required to have
>> commits C and E reachable on the 'master' branch in the submodule,
>> and that may lets you limit the need for such scanning of the
>> history?
>> --
>> To unsubscribe from this list: send the line "unsubscribe git" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]