Re: git-submodule getting submodules from the parent repository

"Avery Pennarun" <apenwarr@xxxxxxxxx> · Tue, 1 Apr 2008 22:03:50 -0400

On 4/1/08, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> "Avery Pennarun" <apenwarr@xxxxxxxxx> writes:
>  > Instead of storing only the commitid of each submodule in the parent
>  > tree, store the current branch name as well.
> > ...
> > This way, cloning a project with submodules will work much like
>  > cloning the parent project; pushing and pulling the parent and the
>  > submodules will do as you expect.
>
> That goes quite against the fundamental design of git submodules in that
>  the submodules are by themselves independent entities.

Not sure what you mean here; the supermodule already stores the
commitid of the submodule.  All I'm proposing is that it also store
the default branchname (ie. the branchname that the submodule was
using when its gitlink was checked into the supermodule) along with
that commitid.  The submodule never knows anything about the
supermodule.

>  An often-cited
>  example is an appliance project, where superproject bundles a clone of
>  Linux kernel and a clone of busybox repositories as its submodules.

What a coincidence!  This is almost exactly like my situation :)

>  If your superproject (i.e. the appliance product) uses two branches to
>  manage two product lines, named "v1" and "v2", these names are local to
>  the superproject.  It should not force the projects you borrow your
>  submodules from to have branches with corresponding name.

I meant that we should store the submodule's branch name when
committing the superproject, and put it back when checking out the
submodule fresh from the superproject.

>   - When not working in a particular submodule, but using it as a component
>    to build the superproject, it would be better to leave its HEAD
>    detached to the version the superproject points at.  IOW, usually you
>    won't have to be on any branch in submodules unless you are working in
>    them.

I agree that the submodule should have its HEAD pointing at exactly
the superproject-specified commit.  However, I believe this commit
should have a local branch name (in the subproject) attached to it, or
else (as I and my co-workers have frequently experienced) people will
accidentally check in to a nameless branch, causing 'git push' to
silently not upload anything, and thus lose track of their commits.  I
have lost work this way.

The idea of naming the local-subproject-branch with the same name as
it had on checking is that then "git pull" in the subproject will work
exactly as expected: it'll get you the latest version of the branch
the superproject developer was on.  But if you *don't* explicitly "git
pull" in the subproject, I'd expect (of course) the checkout to stick
to the commit specified by the superproject - and also to leave its
local branch name pointing at exactly that commit.

>   - Sometimes you need to work in a submodule; e.g. you would want to add
>    'frotz' tool to your copy of busybox.  You chdir to the submodule
>    directory, and develop as if there is no superproject.

This is where my workflow is a bit different.  One of my subprojects
is a library that gets used by several application superprojects.  I
often add features to my library in the process of editing a
particular superproject.  I also expect my co-developers to want to do
the same.  Thus, the difference from your example is that I want to
streamline the process of working in a subproject as well as a
superproject, and minimize the chances of losing data in this case.

With the current system the way it is, it's too easy to make mistakes,
and it requires too many steps to fetch/merge/rebranch each submodule.

>    - Then work on adding that 'frotz' tool.  Make commits, test it in
>      isolation and test with superproject.  Push it out as whichever
>      remote branch the project policy asks you to.

As an orthogonal secondary wish, I'd like to have the subproject and
superproject hosted in the same remote repository.  This appears to be
possible (albeit inefficiently right now) by using "." as the remote
repo name in .gitmodules.  It would be more efficient if
git-submodule-update would use the superproject's checkout as a
--alternate when cloning the submodule... I think that would be easy
and harmless, right?

The super-summary of all that is I think I'd like to make three git
changes here:

1) When checking out a submodule from scratch, use the local
supermodule as a --alternate.  That way if both super and submodule
are hosted in the same remote repo, I don't have to clone them twice.
(And cloning my local repo to another copy doesn't stop git-submodule
from working.)

2) When checking out a submodule, give the submodule's current commit
a useful branch name (ideally, the name it had when the gitlink was
checked into the supermodule).  When updating a submodule with
git-submodule-update, quietly fixup the submodule's local branch ref
if it hasn't been changed; else produce a conflict of some sort.

3) Bonus: make "git push" operate recursively on submodules, and "git
pull" automatically run git-submodule-update.

Does that make sense?

Thanks,

Avery
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html