Let's start with a bit of context. We have this __huge__ "put everything in it"-repository at work, and we want to strip out core modules and integrate them in our different projects through submodules. We moved away one of our core libraries into its separate git repository, and it became a submodule in our big fat repository. I believe it's the kind of things we said people should do when they need partial checkouts (tree-wise) so I assume the workflow I describe here is decent. Just to make things clearer, we have two branches in this repository, 'maint' and 'master'. Maint is the branch for the production product, master is the one where devel happens. 'maint' is obviously merged into 'master' on a regular basis. Problem 1: directory/submodule conflicts (aka D/S) --------- Our first problem was that git doesn't deal with D/S conflicts well. To migrate our repository, I went into 'maint' and did: $ git rm -rf corelib $ git submodule add -b corelib/master -- <our-repo> corelib $ git commit -asm'replace corelib with a submodule' Then I went into 'master' and did: $ git merge maint Here it failed horribly because it claimed that the merge would clobber untracked files like corelib/.gitignore which was a previously tracked file in the huge repository and is now tracked in the submodule. I worked that around by having an intermediate commit that removes 'corelib' in 'master'. Unpretty, but works. Later, when other developers updated their trees, they had all kinds of really distateful issues related to D/S conflicts. Problem 2: integration with git-checkout --------- When using submodules, when I do updates to the corelib, like fixing a bug, hence I want it to appear in 'maint', I go to maint and basically do: $ cd corelib $ git fetch $ git reset --hard origin/corelib/master # so that I have the fix $ cd .. $ git commit -asm'update corelib for bug#nnn' When then I `git checkout master`, the corelib submodule had no modifications in 'maint' but remains in its 'maint' state when I go to master instead of what I would like: see it be checkout to its 'master' state, and refuse to checkout if the submodule cannot perform the checkout. I'd really like git checkout -m to also perform a git checkout -m in submodules. And along the road, one has a lot of frightening errors: fatal: cannot read object b8f1177da31281682feb79c9d4290a88edf067ae 'corelib~Updated upstream': It is a submodule! I quite understand that in presence of submodules git checkout works becomes quite harder as you have to check for every submodule plus yourself to know if you can perform the checkout, but I don't really see why it can't be done. Problem 3: similar problem with git-reset --------- Really, I type git reset --hard all the time to undo my local changes. And I know while typing that it destroys local changes. Really, it should reset the submodules to their supposed state as well. Problem 4: merging --------- When merging two branches, there is a strategy that I believe is applicable for submodules. If one of the two submodules states is a direct ancestor from the other, then the merge result shall be the descendant. When revisions are not in direct line, then it shall be a conflict. Problem 5: fetching --------- `git fetch` should fetch submodules too. Arguably, if you type `git fetch REMOTE` then any submodule that has a corresponding "REMOTE" configured should fetch it. Notes: ----- When you cannot know something required for conflicts handling e.g., (because you haven't enough history for the submodules) the command shall fail asking the user to fetch the incriminated submodules. IOW when you perform any action that involves submodules, each submodules must be queried to know if it can performs the action, and git shall fail if it's not the case and do nothing. Wrt most of the behaviours I described, I would be fine if those were enabled only by a configuration flag in the .gitmodules, and that user can override in their .git/config. We could have a submodule.<module>.commandsMustRecurse setting to tell fetch/reset/checkout/... to behave like I said with this module. I believe that true should be the default. Non initialized submodules should be considered as always up to date for all of this, so that people that don't want to waste bandwidth for this or this submodule can work peacefully. Okay, I'm sure there are tons of other uses of submodules out there for which this is an overkill, but if we really intend seriously to tell people "do use submodules to avoid having incredibly huge repositories" like we did in the past, we should really improve the overall usability. -- ·O· Pierre Habouzit ··O madcoder@xxxxxxxxxx OOO http://www.madism.org
Attachment:
pgpW9nlVSch2P.pgp
Description: PGP signature