Re: [RFD] Long term plan with submodule refs?

Stefan Beller <sbeller@xxxxxxxxxx> · Thu, 9 Nov 2017 11:57:01 -0800

On Wed, Nov 8, 2017 at 9:08 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> Stefan Beller <sbeller@xxxxxxxxxx> writes:
>
>>> The relationship is indeed currently useful, but if the long term plan
>>> is to strongly discourage detached submodule HEAD, then I would think
>>> that these patches are in the wrong direction. (If the long term plan is
>>> to end up supporting both detached and linked submodule HEAD, then these
>>> patches are fine, of course.) So I think that the plan referenced in
>>> Junio's email (that you linked above) still needs to be discussed.
>>
>> This email presents different approaches.
>>
>> Objective
>> =========
>> This document should summarize the current situation of Git submodules
>> and start a discussion of where it can be headed long term.
>> Show different ways in which submodule refs could evolve.
>>
>> Background
>> ==========
>> Submodules in Git are considered as an independet repository currently.
>> This is okay for current workflows, such as utilizing a library that is
>> rarely updated. Other workflows that require a tighter integration between
>> submodule and superproject are possible, but cumbersome as there is an
>> additional step that has to be performed, which is the update of the gitlink
>> pointer in the superproject.
>
> I do not think "rarely updaed" is an issue.
>
> The problem is that we may want to make it easier to use a
> superproject and its submodules as if the combined whole were a
> single project, which currently is not easy, primarily because
> submodules are separate entities with different set of branches that
> can be checked out independently from what branch the superproject
> is working on.

Well and this fact seems to be not a problem in the current use of submodules,
precisely because the workflow either (a) is not too cumbersome or (b)
is executed
not too often to bother enough.

> These are good starting points for copying such a combined whole to
> your local machine and start working on it.  The more interesting,
> important, and potentially difficult part is how the result of such
> work is shared back to where you started from.  "push --recursive"
> may be a simple phrase, but a sensible definition of how it should
> work won't be that simple.
...
>
> We should make detached HEAD safe against gc if it is not,
> regardless of the use of submodules.  I thought it already was made
> safe long time ago.

The detached HEAD itself is protected via its reflog (which is around
for say 2 weeks?)

If I were to develop using detached HEAD only in todays world of
submodules using different branches in the superproject, I run the risk
of loosing some commits in the submodule, as they are not the detached
HEAD all the time, but might even be loose tips.

This combined with the previous paragraph brings in another important
concern:
Some projects would have a very different history when used as a
submodule compared to when used as a stand alone project.
Other projects may be closely aligned between their branches and
what the superproject records.

So the more we deviate from the traditional branch model, the easier
we make it to have the submodule tips be very different from the
standalone tips, which may overexpose us to the gc issues as well as
the general question how much these projects have in common.

>> Use replicate refs in submodules
>> --------------------------------
>> This approach will replicate the superproject refs into the submodule
>> ref namespace, e.g. git-branch learns about --recurse-submodules, which
>> creates a branch of a given name in all submodules. These (topic) branches
>> should be kept in sync with the superproject
>>
>> Pros:
>>  * This seemed intuitive to Gerrit users
>>  * 'quick' to implement, most of the commands are already there,
>>    just git-branch is needed to have the workflows mentioned above complete.
>> Cons:
>>  * What does "git checkout -b A B" mean? (special case: B == HEAD)
>
> The command ran at which level?  In the superproject, or in a single
> submodule?

In the superproject, with --recurse-submodules, as the A and B would recurse
as strings, and not change meaning depending on the gitlink value.

>
>>    Is the branch name replicated as a string into the submodule operation,
>>    or do we dereference the superprojects gitlink and walk from there?
>
> If they are "kept in sync with the superproject", then there should
> be no difference between the two, so I do not see any room for
> wondering about that.

Except you can still break out by issuing commands in the submodule
to change the submodule refs to be different from the superproject.

This was also more along the lines of thinking about the (Gerrit) remote,
which does and okay, but not stellar job in keeping the remote branches
for superproject and submodule in sync. I'd expect glitches there.

> In other words, if there is need to worry
> about the differences between the above two, then it probably is
> fundamentally impossible to keep these in sync, and a design that
> assumes it is possible would have to expose glitches to the end-user
> experience.

yup. And by exposing you probably mean a patch series as presented?
(git status/log/diff making noise about the glitch?)

> I do not know if glitches resulting from there would be so severe to
> be show-stoppers, though.  It might be possible to paper them over.

I think so, too, as long as the user is pointed at the glitch to correct them.

>
>> No submodule refstore at all
>> ----------------------------
>> Use refs and commits in the superproject to stitch submodule changes
>> together. Disallow branches in the submodule. This is only restricted
>> to the working tree inside the superproject, such that the output of git-branch
>> changes depending whether the working tree is in- or outside the superproject
>> working tree.
>
> This would need enhancement for reachability code, but it feels the
> cleanest from the philosophical standpoint---if you want to treat a
> superproject and its submodules as if it were a single project,
> ability to check out a branch in a submodule that does not match
> that of the superproject would only get in the way of preserving the
> illusion of "single project"-ness.

I wonder if we can combine this with the approach Jonathan gave above.
In the worktree (of the submodule inside the superproject) you are allowed
to use these "mirrored" refs, whereas in any other worktree you have full
access to the normal refs of the project.

>
>> New type of symbolic refs
>> =========================
>> A symbolic ref can currently only point at a ref or another symbolic ref.
>> This proposal showcases different scenarios on how this could change in the
>> future.
>>
>> HEAD pointing at the superprojects index
>> ----------------------------------------
>
> This looks to me a mere implementation detail for a (part of)
> necessary component to realize the above "No submodule refstore".

Ah ok.

If all branches would use this new symref type, the handling would
seem to be very similar to what Jonathan described with a new type
of refstore instead.

>> Superproject operations spanning index and worktree
>>   E.g. git reset --mixed
>> As the submodules HEAD is defined in the index, we would reset it to the
>> version in the last commit. As --mixed promises to not touch the working tree,
>> the submodules worktree would not be touched. git reset --mixed in the
>> superproject is the same as --soft in the submodule.
>
> I am not sure if you want to take these promises low-level "single
> repository" plumbing operations make too literally.  "reset --mixed"
> may promise not to touch the working tree, but it also promises not
> to touch submodules at all.  If you are breaking the latter anyway,
> it would make more sense not to be afraid of breaking the former if
> it makes sense in the context of allowing the command to do more by
> breaking the latter.

ok.

Thanks,
Stefan