Re: Composing git repositories

Ramkumar Ramachandra <artagnon@xxxxxxxxx> · Thu, 28 Mar 2013 17:18:49 +0530

Junio C Hamano wrote:
> As I said in another thread, your top-level may be only a part in
> somebody else's project, and what you consider just a part of your
> project may be the whole project to somebody else.  If you pick one
> location to store both for the above clone, e.g. cgit/.git (it could
> be cgit/.ram-git or any other name), embedding it in a yet larger
> project (perhaps having both cgit and gitolite to give a one-stop
> solution for hosting services) later would face the same issue as
> Ram seemed to be complaining.  It needs to address what happens when
> that cgit/.git (or whatever name) gets in the way in the scope of
> the larger project.  That is why I said Ram's rant, using subjective
> words like "elegant", without sound technical justification, did not
> make much sense to me.

I was having a lot of difficulty writing down my thoughts.  Thank you
for providing an illustrative example.  It is terribly hard to do with
our current implementation: we'd have to rewrite the "gitdir: " lines
in all the .git files in the submodule trees and rebuild all the
.git/modules paths.  I'm thinking that we need to separate the object
stores from the worktrees for good.  For a project with no submodules,
the object store can be present in .git/ of the toplevel directory,
like it is now.  The moment submodules are added, all the object
stores should be relocated to a place outside the worktree.  So my
~/src might look like: dotfiles.git/, auto-complete.git/, magit.git/,
git-commit-mode.git/, yasnippet.git/ and dotfiles/.  dotfiles/
contains lots of worktrees stitched together nicely, pointing to these
object stores in ~/src.  This would certainly get rid of the asymmetry
for good.

Now, we can focus our attention on composing git worktrees.  What is a
worktree?  A tree object pointed to by the commit object referred to
by HEAD.  What we need to do is embed one tree inside another using a
mediating object to establish repository boundaries, while not
introducing an ugly seam.  If you think about it, the mediator we've
picked conveys little/ no information to the parent; it says: "there's
a commit with this SHA-1 present in this submodule, but I can't tell
you the commit message, tree object, branch, remote, or anything else"
(obviously because the commit isn't present in the parent's object
store).  So, the mediator might as well have been a SHA-1 string.  And
we have an ugly .gitmodules conveying the remote and the branch.  Why
can't we stuff more information into the mediating object and get rid
of .gitmodules altogether?

Okay, here's a first draft of the new design.  The new mediator object
should look like:

    name = git
    ref = v1.7.8

The name is looked up in refs/modules/<branch>, which in turn looks like:

    [submodule "git"]
        origin = gh:artagnon/git
        path = git
    [submodule "magit"]
        origin = gh:magit/magit
        path = git/extensions/magit

The ref could be 'master', 'HEAD~1', or even a commit SHA-1 (to do the
current anchored-submodules).
Finally, there's a .git file in the worktree, which contains a
"gitdir: " line pointing to the object store, as before.

This solves the two problems that I brought up earlier:
- Floating submodules (which are _necessary_ if you don't want to
propagate commits upwards to the root).
- Initializing a nested submodule without having to initialize all the
submodules in the path leading up to it.

However, I suspect that we can put more information the mediator
object to make life easier for the parent repository and make seams
disappear.  I'm currently thinking about what information git core
needs to behave smoothly with submodules.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html