Re: [PATCHv2] push: change submodule default to check

Stefan Beller <sbeller@xxxxxxxxxx> · Wed, 24 Aug 2016 15:37:29 -0700

On Wed, Aug 24, 2016 at 12:37 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> Jeff King <peff@xxxxxxxx> writes:
>
> This seems to be dropped from the list, probably due to no "To:"
> header in the original, which led to "no", "To-header" "on" and
> "input <" on YOUR recipient list, so I am quoting it in full without
> trimming.
>
>> On Wed, Aug 24, 2016 at 10:30:17AM -0700, Stefan Beller wrote:
>>
>>> When working with submodules, it is easy to forget to push the submodules.
>>> The setting 'check', which checks if any existing submodule is present on
>>> at least one remote of the submodule remotes, is designed to prevent this
>>> mistake.
>>>
>>> Flipping the default to check for submodules is safer than the current
>>> default of ignoring submodules while pushing.
>>
>> It is safer, and that's good. But it's also slower, because it requires
>> an extra traversal of all of the pushed commits. And now people will
>> have to pay the price even if they are not using submodules at all.
>>
>> For instance, try this from a checkout of linux.git:
>>
>>   for i in no check; do
>>       rm -rf dst.git
>>       git init --bare dst.git
>>       echo "==> Pushing with submodules=$i"
>>       time git push --recurse-submodules=$i dst.git HEAD
>>   done
>>
>> The second case takes 30-40 seconds longer. This is a full push of
>> history, so it's an extreme case[1], but it's still rather unfortunate.
>>
>> Can we tie this default to some sign that submodules are actually in
>> use? I don't think the presence of .gitmodules is perfect (because you
>> might be in a bare repo, for example, and have just fetched some other
>> history you are relaying), but it may be a good compromise.  I'm
>> envisioning something like "--recurse-submodules=auto-check" which
>> auto-detects common situations (e.g., presence of .gitmodules or
>> .git/modules checkouts) and enables "check", and then setting the
>> default to that in the long run.
>>
>> -Peff
>>
>> [1] Actually, there is another much worse case lurking there. Try:
>>
>>       git push --recurse-submodules=check --mirror dst.git
>>
>>     from the kernel. I didn't let it finish, but I'd estimate it would
>>     take on the order of 5 hours. The problem is that push feeds each
>>     updated ref tip to find_unpushed_submodules(), so we end up walking
>>     over the same history over and over.
>>
>>     I think it should feed all of the "before" and "after" ref tips it
>>     proposes to update to a _single_ revision traversal.
>
> That sounds massively ... broken.  So before even thinking about
> flipping it to default, this needs to be fixed first.
>

I agree. That sounds bad.

However having the --auto-check feels like papering over the
actual problem which to me sounds like a design problem.
However this may be a viable short term solution.

About the design issue:
We need to answer the question: Which submodule commits
are referenced by a given set of superproject commits.

This question is advancing a very similar question that we'd
have to ask in git-gc. In gc we would end up without having to
worry about a specific set, but rather the all reachable commits
of the superproject are in the given set.

So we could solve two issues at the same time if we had a quick
way to answer this question quickly.

And for that I considered introducing a ref in the submodule
(e.g.) refs/superproject/gc_protector, which records both the
superprojects commit as well as the submodule commit
in case the superproject changed the submodule pointer.

I have some rough patches for this, but I did not consider sending
that to the list as the patches are still rough and not quite
fully thought out on the design level.

--
Actually screw this. After an office discussion with Jonathan (cc'd):

1)  We need to have an "alternate refs namespace", which contains
    secondary data (as this can be derived from the primary data with
    lots of compute). So maybe we need a file similar to the packed refs
    file, "superproject-refs" that behaves as if it is storing refs,
but that file
    is safe to delete as we know all its contents can be recreated.

2) In this new alternate refs space, we can have
    refs/superproject/<sha1> in the submodule with the sha1 of the
    superproject that points to a commit which is a dirty merge of all
    submodule commits, that have no other parents.

e.g.
In the superproject we might have a history of:

  A <- B <- C

with A being origin/master and C our local master.

In the submodule we have:

  D <- E <- F
    \
      G

F is the master branch in the submodule, G is a dangling commit.

Now we could have the following git links:

    A -> D
    B -> G
    C -> F

Then the refs/superproject/<C>
would be a merge of F and G.

When pushing in the superproject (A..C) we would need to make sure
the submodule is updated as well, i.e. we'd look at
refs/superproject/<C> to know we need to advance the remote
submodule history to contain at least G and F.

Then we can do a standard rev walk starting at G and F downwards until we
hit a commit that is contained in remote/refs/*.

> Why wasn't it discovered that "push.recursesubmodules = check" is
> UNUSABLE since it was introduced is simply beyond me.

Maybe it is not compatible with your workflow as you push only once a day?
In a centralized server model you rarely exceed a few commits on push time,
but
    $ git rev-list a829490...c08db5a |wc -l
    23
    $ git rev-list f5df0f2...f3c0fa5 |wc -l
    61
    $ git rev-list 8f73021...de36215 |wc -l
    95

and some new branches, so you pushed around 200 commits.
I think this misbehaves strongly for typical maintainer work flows.

> I am mad (which is unusual for me, isn't it? -- I've seen somebody
> else saying "I am mad", but I do not think I ever said it here).

Please direct your anger at the design of submodules.
The assumption of the submodule being sort of 'independant' doesn't allow
for efficient data structures I would think. So maybe we need to teach the
submodules about the existence and state of the superproject?

/me ducks
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html