On Wed, Jan 08, 2014 at 10:17:51PM -0800, W. Trevor King wrote: > In another branch of the submodule thread Francesco kicked off, I > mentioned that we could store the preferred local submodule branch on > a per-superbranch level if we used the > .git/modules/<submodule-name>/config for local overrides [1]. Here's > a patch series that greatly extends my v2 "submodule: Respect > requested branch on all clones" series [2] to also support automatic, > recursive submodule checkouts, as I outlined here [3]. > > [1]: http://article.gmane.org/gmane.comp.version-control.git/240240 > [2]: http://article.gmane.org/gmane.comp.version-control.git/239967 > [3]: http://article.gmane.org/gmane.comp.version-control.git/240192 While mulling over better ways to explain my local-branch idea, I've come up with a more tightly bound model that may help break the silence that has greeted the “Preferred local submodule branches” series ;). That series doesn't have strong options on update mechanics, which leads to wishy-washy exchanges where nobody has a clear mental picture: On Thu, Jan 09, 2014 at 10:40:52PM +0100, Jens Lehmann wrote: > Am 09.01.2014 20:55, schrieb W. Trevor King: > > On Thu, Jan 09, 2014 at 08:23:07PM +0100, Jens Lehmann wrote: > >> Am 09.01.2014 18:32, schrieb W. Trevor King: > >>>> when superproject branches are merged (with and without conflicts), > >>> > >>> I don't think this currently does anything to the submodule itself, > >>> and that makes sense to me (use 'submodule update' or my 'submodule > >>> checkout' if you want such effects). We should keep the current logic > >>> for updating the gitlinked $sha. In the case that the > >>> .gitmodule-configured local-branches disagree, we should give the > >>> usual conflict warning (and <<<===>>> markup) and let the user resolve > >>> the conflict in the usual way. > >> > >> For me it makes lots of sense that in recursive checkout mode the > >> merged submodules are already checked out (if possible) right after > >> a superproject merge, making another "submodule update" unnecessary > >> (the whole point of recursive update is to make "submodule update" > >> obsolete, except for "--remote"). > > > > If you force the user to have the configured local-branch checked out > > before a non-checkout operations with checkout side-effects (as we > > currently do for other kinds of dirty trees), I think you'll avoid > > most (all?) of the branch-clobbering problems. > > I'm thinking that a local branch works in two directions: It should > make it easy to follow an upstream branch and also make changes to it > (and publish those) if necessary. But neither local nor upstream > changes take precedence, so the user should either use "merge" or > "rebase" as update strategy or be asked to resolve the conflict > manually when "checkout" is configured and the branches diverged. > Does that make sense? The current series is only weakly bound (you can explicitly call git submodule checkout' to change to the preferred local submodule branch), and the current Git is extremely weakly bound (you have to cd into the submodule and change branches by hand). The following extrapolates the “Preferred local submodule branches” series to a tightly-bound ideal. Gitlinked commit hash --------------------- The submodule model revolves around links to commits (“gitlinks”): $ git ls-tree HEAD 100644 blob 189fc359d3dc1ed5019b9834b93f0dfb49c5851f .gitmodules 160000 commit fbfa124c29362f180026bf0074630e8bd0ff4550 submod These are effectively switchable trees. The tree referenced by commit fbfa124 is 492781c: $ (cd submod/ && git cat-file commit fbfa124) tree 492781c581d4dec380a61ef5ec69a104de448a74 … If you init the submodule, subsequent checkouts will check out that tree, just like 'git checkout' would do if you'd had a superproject tree like: $ git ls-tree HEAD 100644 blob 189fc359d3dc1ed5019b9834b93f0dfb49c5851f .gitmodules 040000 tree 492781c581d4dec380a61ef5ec69a104de448a74 submod For folks who treat the submodule as a black box (and do no local development), switchable trees are all they care about. They can easily checkout (or not, with deinit), the submodule tree at a gitlinked hash, and everything is nice and reproducible. The fact that 'submod' is stored as a commit object and not a tree, is just a convenient marker for optional init/deinit/remote-update-integration functionality. Additional metadata, the initial checkout, and syncing down ----------------------------------------------------------- However, folks who do local submodule development will care about which submodule commit is responsible for that tree, because that's going to be the base of their local development. They also care about additional out-of-tree information, including the branch that commit is on. For already-initialized submodules, there are existing places in the submodule config to store this configuration: 1. HEAD for the checked-out branch, 2. branch.<name>.remote → remote.<name>.url for the upstream subproject URL, 4. branch.<name>.rebase (or pull.rebase) to prefer rebase over merge for integration, 5. … You need somewhere in-tree to store this destined-to-be-out-of-tree information, so that superproject developers that have not yet initialized the submodule will know what values are suggested by the superproject maintainers. That's where .gitmodules comes in, because storing all of this fairly static, locally overridable information in the gitlink itself would be nonsensical (said Linus in 2007 [1]). When you checkout a submodule for the first time, Git should take the default information from .gitmodules and file it away in the submodule's appropriate out-of-tree config locations. The out-of-tree data listed above should be stored in: 1. submodule.<name>.local-branch 2. submodule.<name>.url 4. submodule.<name>.update 5. … Once you have an in-tree way to specify defaults for this out-of-tree information, you're going to have developers like me that just want to stick with the defaults, following them through changes. That means you'd like to have the “copy .gitmodules defaults into your submodule's config” functionality that usually happens on the initial submodule checkout happen on *every superproject-initiated checkout*. In fact, I think life is easier for everyone if this is the default, and we add a new option (submodule.<name>.sync = false) that says “don't overwrite optional settings in my submodule's out-of-tree config on checkout” for for folks who want to opt out. Don't worry, this is not going to clobber people, because we'll be syncing the other way too. Syncing up ---------- In the previous section I explained how data should flow from .gitmodules into out-of-tree configs. What about the other direction? We currently let folks handle this by hand, but I'd prefer a tighter integration between the submodule config and the superproject tree to avoid losing work. That means changes to tracked submodule status (checked-out hash, checked-out branch, upstream URL, upstream branch, default integration strategy, …) should trigger dirty-tree status just like uncommitted changes to in-tree files. 'git add' (or stash) on the dirty submodule would store changed commit hashes in the index, pull changed out-of-tree configs back into the in-tree .gitmodules, and add the new .gitmodules to the index. If the working .gitmodules was already dirty (vs. the index), the add/stash should die without making any changes. If the user has disabled syncing between .gitmodules and the submodule's out-of-tree configs, then don't worry about optional settings. Always sync the required settings, which at this point would just be submodule.<name>.local-branch. Purely local metadata --------------------- Some metadata does not make sense in the superproject tree. For example, whether a submodule is interesting enough to checkout (init/deinit) or whether you want to auto-sync optional metadata .gitmodules defaults. This metadata should live in the superproject's out-of-tree config, and should not be stored in the in-tree .gitmodules. Since you *will* want to share the upstream URL, I proposed using an explicit submodule.<name>.active setting to store the “do I care” information [2], instead of overloading submodule.<name>.url (I'd auto-sync the .gitmodule's submodule.<name>.url with the subproject's remote.origin.url unless the user opted out of .gitmodules syncing). Subsequent checkouts -------------------- Now that we have strict linking between the submodule state (both in-tree and out-of-tree configs) and the superproject tree (gitlink and .gitmodules), changing between superproject branches is really easy: 1. Make sure the working tree is not dirty. If it is, ask the user to either add-and-commit or stash, and then die to let them do so. 2. Checkout the new superproject branch. 2.1. For each old submodule that doesn't exist in the new branch, blow away the submodule directory (assuming a new-style .git/modules/… layout, and not an old-style submod/.git/… layout). 2.2. For each gitlinked submodule that didn't exist in the old branch, setup the submodule as if you were doing the initial cloning checkout (forcing a new local-branch to point at the gitlinked commit). If you find local out-of-tree *superproject* configs that conflict with the .gitmodules values, prefer the superproject configs. Clobber submodule configs and local branches at will (modulo submodule.<name>.sync), because any submodule configs that the user wanted to keep should have been added to the superproject branch earlier (or stashed). Integrating other branches -------------------------- Merges and rebases can alter the submodule's in-tree configs (and create and remove submodules). The existing logic for merging .gitmodules and gitlinks works well, so stick with that. In the event that there are unresolvable conflicts, bail out and let the user resolve the conflicts and use 'git commit' to finish checking out the resolved state. Issues ------ I like the current submodule integration configuration: * submodule.<name>.branch (specify the remote branch to integrate, but I'd prefer submodule.<name>.integration-ref for clarity). * submodule.<name>.update (specify how to integrate it, but I'd prefer submodule.<name>.integration-mode for clarity). more than the current core integration configuration: * branch.<name>.merge (with branch.<name>.remote, the branch to remote branch to integrate via merging). * branch.<name>.rebase (override branch.<name>.merge to integrate via rebasing). These seem to mix the orthogonal concepts of integration target and integration mode, and the divergence from the .gitmodules representation makes syncing awkward. Summary ------- New .gitmodules options: * submodule.<name>.local-branch, store the submodule's HEAD, must stay in sync for checkouts. New .git/config options: * submodule.<name>.active, for init/deinit. * submodule.<name>.sync, for whether you want to automatically sync the submodule's out-of-tree configs up to .gitmodules before checkout operations, and sync back from .gitmodules (possibly altered on the new branch) into the submodule's out-of-tree configs during checkout. With this tighter binding, submodule information is either tracked in the superproject, or explicitly not touched by the superproject. That makes it much harder to break things or clobber a user's work, and also much easier to keep submodules up to date with superproject changes. Users shouldn't have to explicitly manage their submodules to carry out routine core tasks like checking out other branches. I see no reason to add --recurse-submodule flags to 'git checkout' (and merge, …). Anything that happens post-clone should recurse through submodules automatically, and use the submodule.<name>.active setting to decide when recursion is desired. I think the ideal submodule-specific interface would be just: * git submodule [--quiet] add [-b <branch>] [-f|--force] [--name <name>] [--reference <repository>] [--] <repository> [<path>] * git submodule [--quiet] init [--] [<path>...] * git submodule [--quiet] deinit [-f|--force] [--] <path>... * git submodule [--quiet] foreach [--recursive] <command> The current 'git submodule update --remote' would just be: $ git submodule foreach 'git pull' because all of the local-branch checkouts would have already been handled. Similarly, a global push would be just: $ git submodule foreach 'git push' You get all the per-submodule configuration (for triangular workflows, etc.) for free, with no submodule-specific confusion. So, is this: * Interesting enough to be worth pursuing? * Simple enough to be easily understood? I'd be happy to mock this up in shell, but only if anyone else would be interested enough to review the implementation ;). Then I'll look into integrating the preferred model (this tightly bound proposal, or v3's looser bindings, or <your idea here>) in C, building on Jens and Jonathan's work. Cheers, Trevor [1]: http://article.gmane.org/gmane.comp.version-control.git/44162 [2]: http://article.gmane.org/gmane.comp.version-control.git/211042 -- This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy
Attachment:
signature.asc
Description: OpenPGP digital signature