Re: [PATCH v3 8/8] clone, submodule update: create and check out branches

Glen Choo <chooglen@xxxxxxxxxx> · Tue, 22 Nov 2022 10:44:25 -0800

Thanks for the thoughtful response, Jonathan :)

Jonathan Tan <jonathantanmy@xxxxxxxxxx> writes:

> "Glen Choo via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:
>> +# Test the behavior of an already-cloned submodule.
>> +# NEEDSWORK When updating with branches, we always use the branch instead of the
>> +# gitlink's OID. This results in some imperfect behavior:
>> +#
>> +# - If the gitlink's OID disagrees with the branch OID, updating with branches
>> +#   may result in a dirty worktree
>> +# - If the branch does not exist, the update fails.
>> +#
>> +# We will reevaluate when "git checkout --recurse-submodules" supports branches
>> +# For now, just test for this imperfect behavior.
>
> I think the rationale for this behavior is as follows:
>
> We want a world in which submodules have branches and Git commands use them
> wherever possible. There are a few options for "git submodule update" when the
> superproject has a branch checked out:
>
> 1. Checkout the branch, ignoring OID (as in this patch).
> 2. Checkout the branch, erroring out if the OID is wrong.
> 3. 1 + creating the branch if it does not exist.
> 4. 2 + creating the branch if it does not exist.
> 5. Always forcibly create the branch at the gitlink's OID and then checking
>    it out.
>
> At this point in the discussion, for a low-level command like "git submodule
> update", doing as little as possible makes sense to me, which is 1.
>
> But since we do not automatically create the branch if it does not exist, this
> means that we have to do it when we clone the submodule. Our options are:
>
> A. Create only the branch that is checked out in the superproject (as in this
>    patch).
> B. Create all branches that are present in the superproject.
> C. Go back on our previous decision, switching to 3.
>
> My instinct is that we want to maintain, as much as possible, the invariant
> that for each branch in the superproject, if the branch tip has a gitlink
> pointing to a submodule, that submodule has a branch of the same name. And I
> think that this invariant can only be maintained by "git submodule update" if
> we use B or C.

I think C is good to have in this series, though for slightly different
reasons.

I agree that the invariant should be preserved when we check out
branches both in the initial clone and in subsequent checkouts. However,
I don't think that we necessarily need to have all superproject branches
after the initial clone. Even if the submodule only has a single
superproject branch, that's enough to have an ephemeral clone for
writing small changes. We could defer the "all superproject branches"
problem til after we worry about subsequent checkouts (i.e. "git
checkout" with branches).

We can handle "initial clone" and "subsequent checkout" as smaller, more
digestible series as long as the work for "initial clone" doesn't get in
the way of "subsequent checkout". My plan (as of v2) was:

- For the intial clone, create only the checked out superproject branch
  at clone time and check it out (aka A)
- For subsequent checkouts, check out the superproject branch, creating
  it if it does not exist (aka C)

But it doesn't make sense to mix both A _and_ C, since C would already
give us the same result as A, so it probably makes sense to go straight
to C in this series (i.e. only for the initial clone, not subsequent
checkouts). I'll do that in v3.

I prefer C in the long run, since both A and B require that the list of
submodule branches never get out of sync with the superproject, which is
hard to enforce, e.g.:

- The user could create a branch in the superproject without recursing
  in to submodules.
- The user could delete the branch in the submodule.
- (Worst yet) The process that creates branches in the submodule _after_
  creating the branch in the superproject could exit unexpectedly (e.g.
  SIGINT). There is no atomic way to create branches in both repos.

We could create a command that would repair broken branch states ("git
submodule repair"?), but C can self-repair, which avoids this problem
entirely.