submodules and interaction with GIT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Let's start with a bit of context. We have this __huge__ "put everything
in it"-repository at work, and we want to strip out core modules and
integrate them in our different projects through submodules.  We moved
away one of our core libraries into its separate git repository, and it
became a submodule in our big fat repository. I believe it's the kind of
things we said people should do when they need partial checkouts
(tree-wise) so I assume the workflow I describe here is decent.

Just to make things clearer, we have two branches in this repository,
'maint' and 'master'. Maint is the branch for the production product,
master is the one where devel happens. 'maint' is obviously merged into
'master' on a regular basis.


Problem 1: directory/submodule conflicts (aka D/S)
---------

Our first problem was that git doesn't deal with D/S conflicts well.  To
migrate our repository, I went into 'maint' and did:

  $ git rm -rf corelib
  $ git submodule add -b corelib/master -- <our-repo> corelib
  $ git commit -asm'replace corelib with a submodule'

  Then I went into 'master' and did:

  $ git merge maint

Here it failed horribly because it claimed that the merge would clobber
untracked files like corelib/.gitignore which was a previously tracked
file in the huge repository and is now tracked in the submodule.

I worked that around by having an intermediate commit that removes
'corelib' in 'master'. Unpretty, but works.  Later, when other
developers updated their trees, they had all kinds of really distateful
issues related to D/S conflicts.



Problem 2: integration with git-checkout
---------

When using submodules, when I do updates to the corelib, like fixing a
bug, hence I want it to appear in 'maint', I go to maint and basically
do:

  $ cd corelib
  $ git fetch
  $ git reset --hard origin/corelib/master # so that I have the fix
  $ cd ..
  $ git commit -asm'update corelib for bug#nnn'

When then I `git checkout master`, the corelib submodule had no
modifications in 'maint' but remains in its 'maint' state when I go to
master instead of what I would like: see it be checkout to its 'master'
state, and refuse to checkout if the submodule cannot perform the
checkout.

I'd really like git checkout -m to also perform a git checkout -m in
submodules.

And along the road, one has a lot of frightening errors:
    fatal: cannot read object b8f1177da31281682feb79c9d4290a88edf067ae 'corelib~Updated upstream': It is a submodule!


I quite understand that in presence of submodules git checkout works
becomes quite harder as you have to check for every submodule plus
yourself to know if you can perform the checkout, but I don't really see
why it can't be done.


Problem 3: similar problem with git-reset
---------

Really, I type git reset --hard all the time to undo my local changes.
And I know while typing that it destroys local changes. Really, it
should reset the submodules to their supposed state as well.



Problem 4: merging
---------

When merging two branches, there is a strategy that I believe is
applicable for submodules. If one of the two submodules states is a
direct ancestor from the other, then the merge result shall be the
descendant.

When revisions are not in direct line, then it shall be a conflict.


Problem 5: fetching
---------

`git fetch` should fetch submodules too. Arguably, if you type `git
fetch REMOTE` then any submodule that has a corresponding "REMOTE"
configured should fetch it.


Notes:
-----

When you cannot know something required for conflicts handling e.g.,
(because you haven't enough history for the submodules) the command
shall fail asking the user to fetch the incriminated submodules. IOW
when you perform any action that involves submodules, each submodules
must be queried to know if it can performs the action, and git shall
fail if it's not the case and do nothing.

Wrt most of the behaviours I described, I would be fine if those were
enabled only by a configuration flag in the .gitmodules, and that user
can override in their .git/config. We could have a
submodule.<module>.commandsMustRecurse setting to tell
fetch/reset/checkout/... to behave like I said with this module. I
believe that true should be the default.

Non initialized submodules should be considered as always up to date for
all of this, so that people that don't want to waste bandwidth for this
or this submodule can work peacefully.



Okay, I'm sure there are tons of other uses of submodules out there for
which this is an overkill, but if we really intend seriously to tell
people "do use submodules to avoid having incredibly huge repositories"
like we did in the past, we should really improve the overall usability.
-- 
·O·  Pierre Habouzit
··O                                                madcoder@xxxxxxxxxx
OOO                                                http://www.madism.org

Attachment: pgpW9nlVSch2P.pgp
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux