Thanks for the thoughtful response, Jonathan :) Jonathan Tan <jonathantanmy@xxxxxxxxxx> writes: > "Glen Choo via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: >> +# Test the behavior of an already-cloned submodule. >> +# NEEDSWORK When updating with branches, we always use the branch instead of the >> +# gitlink's OID. This results in some imperfect behavior: >> +# >> +# - If the gitlink's OID disagrees with the branch OID, updating with branches >> +# may result in a dirty worktree >> +# - If the branch does not exist, the update fails. >> +# >> +# We will reevaluate when "git checkout --recurse-submodules" supports branches >> +# For now, just test for this imperfect behavior. > > I think the rationale for this behavior is as follows: > > We want a world in which submodules have branches and Git commands use them > wherever possible. There are a few options for "git submodule update" when the > superproject has a branch checked out: > > 1. Checkout the branch, ignoring OID (as in this patch). > 2. Checkout the branch, erroring out if the OID is wrong. > 3. 1 + creating the branch if it does not exist. > 4. 2 + creating the branch if it does not exist. > 5. Always forcibly create the branch at the gitlink's OID and then checking > it out. > > At this point in the discussion, for a low-level command like "git submodule > update", doing as little as possible makes sense to me, which is 1. > > But since we do not automatically create the branch if it does not exist, this > means that we have to do it when we clone the submodule. Our options are: > > A. Create only the branch that is checked out in the superproject (as in this > patch). > B. Create all branches that are present in the superproject. > C. Go back on our previous decision, switching to 3. > > My instinct is that we want to maintain, as much as possible, the invariant > that for each branch in the superproject, if the branch tip has a gitlink > pointing to a submodule, that submodule has a branch of the same name. And I > think that this invariant can only be maintained by "git submodule update" if > we use B or C. I think C is good to have in this series, though for slightly different reasons. I agree that the invariant should be preserved when we check out branches both in the initial clone and in subsequent checkouts. However, I don't think that we necessarily need to have all superproject branches after the initial clone. Even if the submodule only has a single superproject branch, that's enough to have an ephemeral clone for writing small changes. We could defer the "all superproject branches" problem til after we worry about subsequent checkouts (i.e. "git checkout" with branches). We can handle "initial clone" and "subsequent checkout" as smaller, more digestible series as long as the work for "initial clone" doesn't get in the way of "subsequent checkout". My plan (as of v2) was: - For the intial clone, create only the checked out superproject branch at clone time and check it out (aka A) - For subsequent checkouts, check out the superproject branch, creating it if it does not exist (aka C) But it doesn't make sense to mix both A _and_ C, since C would already give us the same result as A, so it probably makes sense to go straight to C in this series (i.e. only for the initial clone, not subsequent checkouts). I'll do that in v3. I prefer C in the long run, since both A and B require that the list of submodule branches never get out of sync with the superproject, which is hard to enforce, e.g.: - The user could create a branch in the superproject without recursing in to submodules. - The user could delete the branch in the submodule. - (Worst yet) The process that creates branches in the submodule _after_ creating the branch in the superproject could exit unexpectedly (e.g. SIGINT). There is no atomic way to create branches in both repos. We could create a command that would repair broken branch states ("git submodule repair"?), but C can self-repair, which avoids this problem entirely.