Re: git submodules

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jul 28, 2008 at 10:41:17PM +0000, Junio C Hamano wrote:
> I suspect the use of it may help the use case Pierre proposes, but its
> main attractiveness as I understood it back when we discussed the facility
> was that you could switch branches between 'maint' that did not have a
> submodule at "path" back then, and 'master' that does have one now,
> without losing the submodule repository.  When checking out 'master' (and
> that would probably mean you would update 'git-submodule init' and
> 'git-submodule update' implementation), you would instanciate subdirectory
> "path", create "path/.git" that is such a regular file that that points at
> somewhere inside the $GIT_DIR of superproject (say ".git/submodules/foo").
> By storing refs and object store are all safely away in the superproject
> $GIT_DIR, you can now safely switch back to 'maint', which would involve
> making sure there is no local change that will be lost and then removing
> the "path" and everything underneath it.

gitfiles looks nifty for sure, though I've thought about it a bit, and
I'm not sure if we don't want something a bit more powerful, though
still in the same vein.

If we look at submodules, I quite believe that we would benefit a lot
from sharing the object directory accross the supermodule and all its
submodules, because of the following reasons:

  * It could make things like git-blame better: at work, it's common for
    us to move files across submodules: we have a stable library shared
    accross projects, and move there C modules that have staged for
    quite some time in the applications and are stable enough, and it's
    pity to loose history then, whereas git could really guess about the
    move if it sees through GITLINKS in the same object repository.
    GITLINKS are not very different from trees actually if you can look
    through them, it's just a matter of dereferencing twice instead of
    once.

  * For people that have made a subdirectory become a submodule (and
    it's also something that can happen) it's likely that lots of blobs
    are shared. It would end up taking less disk space.

  * It helps people fixing situations where they pushed a supermodule
    with a substate that never existed without seeing it. Since the
    object store is shared, this commit that actually never existed will
    never ever be pruned, and at _least_ one person on earth will never
    lose it. With detached heads everywhere it's very easy to not name a
    detached head, and have it pruned at some point.

  * I _believe_ (just a hunch) that it helps knowing if it's possible to
    perform a "recursive" (wrt submodules) checkout/reset/$whatever,
    without having to spawn subcommands and quite unpleasant similar
    stuff.


Though we would not like to have submodules suffer from reachability
issues after a prune in the supermodule. That means that all references
and reflogs of the submodules shall be accessible from the supermodule
so that everything that could mess with the object store by removing
objects cannot remove interesting objects (that should limit the code
paths to really seldom places actually).


So what I've thinked about was to extend gitfiles so that it can also
define where to find not only the git_dir but also the object store.
Moving the current "faked symlink" approach to a less terse file looking
like a standard git-config one. E.g.:

    [gitfile]
	git_dir = some/path/.git/submodules/foo/
        objects = some/path/.git/objects
        # why not other settings in the future ?

This part is quite easy and straightforward (and it can be done while
keeping backward compatibility with the current way gitfiles work).
What I can't decide is how we deal with the reflogs and references. I
see two choices. Assuming the submodules git_dir's are under the
supermodule $GIT_DIR/submodules/$name_of_the_super_module/:

  (1) we do nothing more.

  (2) we melt the submodules reflogs and references into the supermodule
      ones with appropriate namespacing. For example, for a submodule
      named "foo/bar" we would have its reflogs live in the supermodule
      .git/logs/submodules/foo/bar/logs/* and its references under
      .git/refs/submodules/foo/bar/refs/*. For that we add 'logs =' and
      'refs =' to the gitfile.

The first approach need us to be able to somehow recurse under
.git/submodules to understand what inside that looks like a git_dir, and
teach reachability commands to look at the refs inside them. It can be
quite a lot of work, especially since we can have submodules inside
submodules at some point.

The second approach has the net benefit that no pruning command has to
be modified to work. Many commands that we want to act on the global
repository will just work. Though, we have to fix a couple of issues
too:
  (1) be able to have a references directory that is not .git/refs. I
      looked at the source, I believe only 3 or 4 places in the C code
      have to be fixed for that to work, maybe a bit more in the shell
      commands, but that should be fairly easy.

  (2) it will break reference packing, because the submodules won't see
      the supermodule packed-refs file, and we will probably have to
      draft a new packed-refs thingy because of this issue. A simple
      possibility I see is to move packed-refs as refs/.packed-refs (as
      a starting dot cannot be a reference name). Then teach
      git-pack-refs to generate a .packed-refs each time it crosses a
      'refs/' directory name, and finally learn how to load those (and
      no it won't require to recurse into the whole refs/, we can mark
      in the toplevel refs/.packed-refs that it has submodules and that
      there is a .packed-refs under
      refs/submodules/foo/bar/refs/.packed-refs and avoid the costly
      recursion).

  (3) we will have to teach for_each_ref to skip "/submodules",
      which is I believe fairly easy.


I personnaly like the second approach better because it will scale
better (I believe) when people will do submodules into submodules into
submodules. But I'm unsure if it's too disruptive or not.

So .. comments thoughts remarks are welcomed :)



Note: with enhanced gitfiles, and making workdirs use gitfiles, with any
      of those approaches, it's easy to make workdirs that won't have
      the "if we repack we may lose things referenced from other
      workdir's reflogs" problem anymore. Which is kind of a nifty side
      effect ;)
-- 
·O·  Pierre Habouzit
··O                                                madcoder@xxxxxxxxxx
OOO                                                http://www.madism.org

Attachment: pgpjB6NH6wqOV.pgp
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux