Re: [RFD] Long term plan with submodule refs?

Jonathan Tan <jonathantanmy@xxxxxxxxxx> · Wed, 8 Nov 2017 17:29:45 -0800

On Wed,  8 Nov 2017 16:10:07 -0800
Stefan Beller <sbeller@xxxxxxxxxx> wrote:

I thought of a possible alternative and how it would work.

> Possible data models and workflow implications
> ==============================================
> In the following different data models are presented, which aid a submodule
> heavy workflow each giving pros and cons.

What if, in the submodule, we have a new ref backend that mirrors the
superproject? When initializing the submodule, its original refs are not
cloned at all, but instead virtual refs are used.

Creation of brand-new refs is forbidden in the submodule.

When reading a ref in the submodule, if that ref is the current branch
in the superproject, read the corresponding gitlink entry in the index
(which may be dirty); otherwise read the gitlink in the tree of the tip
commit.

When updating a ref in the submodule, if that ref is the current branch
in the superproject, update the index; otherwise, create a commit on top
of the tip and update the ref to point to the new tip.

No synchronicity is enforced between superproject and submodule in terms
of HEAD, though: If a submodule is currently checked out to a branch,
and the gitlink for that branch is updated through whatever means, that
is equivalent to a "git reset --soft" in the submodule.

These rules seem straightforward to me (although I have been working
with Git for a while, so perhaps I'm not the best judge), and I think
leads to a good workflow, as discussed below.

> Workflows
> =========
> * Obtaining a copy of the Superproject tightly coupled with submodules
>   solved via git clone --recurse-submodules=<pathspec>
> * Changing the submodule selection
>   solved via submodule.active flags
> * Changing the remote / Interacting with a different remote for all submodules
>   -> need to be solved, not core issue of this discussion
> * Syncing to the latest upstream
>   solved via git pull --recurse  

(skipping the above, since they are either solved or not a core issue)

> * Working on a local feature in one submodule
>   -> How do refs work spanning superproject/submodule?

This is perhaps one weak point of my proposal - you can't work on a
submodule as if it were independent. You can checkout a branch and make
commits, but (i) they will automatically affect the superproject, and
(ii) the "origin/foo" etc. branches are those of the superproject. (But
if you checkout a detached HEAD, everything should still work.)

> * Working on a feature spanning multiple submodules
>   -> How do refs work spanning multiple repos?

The above rules allow the following workflow:
 - "checkout --recurse-submodules" the branch you want on the
   superproject
 - make whatever changes you want in each submodule
 - commit each individual submodule (which updates the index of the
   superproject), then commit the superproject (we can introduce a
   commit --recurse-submodules to make this more convenient)
 - a "push --recurse-submodules" can be implemented to push the
   superproject and its submodules independently (and the same refspec
   can be legitimately used both when pushing the superproject and when
   pushing a submodule, since the ref names are the same, and not by
   coincidence)

If the user insists on making changes on a non-current branch (i.e. by
creating commits in submodules then using "git update-ref" or
equivalent), possibly multiple commits would be created in the
superproject, but the user can still squash them later if desired.

> * Working on a bug fix (Changing the feature that you currently work on, branches)
>   -> How does switching branches in the superproject affect submodules

You will have to stash or commit your changes. (Which reminds me...GC in
the subproject will need to consult the revlog of the superproject too.)

> New type of symbolic refs
> =========================
> A symbolic ref can currently only point at a ref or another symbolic ref.
> This proposal showcases different scenarios on how this could change in the
> future.
> 
> HEAD pointing at the superprojects index
> ----------------------------------------

Assuming we don't need synchronicity, the existing HEAD format can be
retained. To clarify what happens during ref writes, I'll reuse the
scenarios Stefan described:

> Ref write operations driven by the submodule, affecting target ref
>   e.g. git commit, reset --hard, update-ref (in the submodule)
> 
> The HEAD stays the same, pointing at the superproject.
> The gitlink is changed to the target sha1, using
> 
>   git -C <superproject> update-index --add \
>       --cacheinfo 160000,$SHA1,<gitlink-path>
> 
> This will affect the superprojects index, such that then a commit in
> the superproject is needed.

In this proposal, the HEAD also stays the same (pointing at the branch).

Either the index is updated or a commit is needed. If a commit is
needed, it is automatically performed.

> Ref write operations driven by the superproject, changing the gitlink
>   e.g. git checkout <tree-ish>, git reset --hard (in the superproject)
> 
> This will change the gitlink in the superprojects index, such that the HEAD
> in the submodule changes, which would trigger an update of the
> submodules working tree.

The HEAD in the submodule is unchanged. If the value of a ref has
changed "from underneath", this is as if a "git reset --soft" was done.

> Superproject operations spanning index and worktree
>   E.g. git reset --mixed
> As the submodules HEAD is defined in the index, we would reset it to the
> version in the last commit. As --mixed promises to not touch the working tree,
> the submodules worktree would not be touched. git reset --mixed in the
> superproject is the same as --soft in the submodule.

Same.

> Consistency considerations (gc)
>   e.g. git gc --aggressive --prune=now
> 
> The repacking logic is already aware of a detached HEAD, such that
> using this new symref mechanism would not generate problems as long as
> we keep the HEAD attached to the superproject. However when commits/objects
> are created while the HEAD is attached to the superproject and then HEAD
> switches to a local branch, there are problems with the created objects
> as they seem unreachable now.
> 
> This problem is not new as a superproject may record submodule objects
> that are not reachable from any of the submodule branches. Such objects
> fall prey to overzealous packing in the submodule.

The scenario Stefan describes will work OK - if a commit is created
while the HEAD is pointing to a branch, then either the superproject's
index will be updated or commits will be created in the superproject.
When GC reads the list of refs in the submodule, the new submodule
commit will be included. (Remember that if the superproject's current
branch is "foo", "refs/heads/foo" in the submodule reflects the
superproject's index, so any changes to the index, though uncommitted,
will appear as a ref.)

The problem still exists (e.g. stashes in the superproject) but is
reduced, I think.