Re: .gitlink for Summer of Code

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Tue, 27 Mar 2007, Josef Weidendorfer wrote:

> On Tuesday 27 March 2007, Junio C Hamano wrote:
> > Martin Waitz <tali@xxxxxxxxxxxxxx> writes:
> > 
> > > For submodules I currently use <parent>/.git/objects/module/<submodule>/
> > > to store the objects belonging to the submodule.
> > 
> > I was not following the gitlink discussion closely, but what is
> > the motivation behind this separation of the object store?
> 
> The separation issue is about scalability of submodules, and not
> directly about gitlink.

NOTE! It's fine to share the *object*store* for a supermodule setup.

The scalability concerns are not about the number of objects, but about 
the operations that work on them, and specifically *traverse* the objects.

So while it's fine to share the same GIT_OBJECT_DIR for all the 
submodules, it's *not* ok if "git clone" on a supermodule will consider 
things to be one single repository, and clone it as one huge thing, 
generating (and having to look up!) a ten-million object pack for a 
hundred smaller projects. THAT won't scale.

Basically, a "git-rev-list --objects HEAD" in the super-module should only 
list the objects in the supermodule itself, not in all the submodules. And 
that implies that cloning a supermodule is not about cloning a single big 
repository: it would be a matter of:

 - first cloning first the supermodule itself (which is often fairly 
   small: just a top-level directory, with some top-level Makefiles and a 
   number of directories that are submodules)

 - then parsing some supermodule data structure, and cloning each 
   submodule individually.

Similarly for "fetch" (and merging too, of course - it ends up having to 
merge each sub-project separately). 

Think of it this way: if you think people find it a bit annoying that you 
currently have to get all the history when you do clone (and why people 
have worked on "shallow clones" in git), imagine just *how* frustrating it 
is if you have to get all five-hundred subprojects when you only want to 
work on one small one!

Think of something like a huge *BSD "world" tree, where the supermodule 
contains *everything*. Do you really _really_ expect that every single 
developer wants to clone it all? I have no idea how much that is, but I 
can well imagine that it's several thousand subprojects, some of which are 
quite big in their own right. 

Also, imagine the server side.. Anybody who thinks that the server wants 
to (or is even *able* to) do things like a fsck on the totality, or keep 
every single object in memory, is in for a nasty surprise..

So I think that:

 - sharing object directories should not be a requirement, but it should 
   certainly be *possible*. Quite often you might want to do it, although 
   for really big superprojects it might well make sense to have 
   individual object stores too.

 - walking the *global* object list is simply not possible. You need to 
   fsck every single subtree individually, and fsck the superproject on 
   its own, *without* recursing into the subprojects. And you need to be 
   able to clone the superproject and only one or two subprojects, and 
   never see it as one "atomic" big repository.

I really think people should think about the *BSD kind of "world" setup. 
You absolutely do _not_ want supermodules to be indivisible "everything or 
nothign" kind of things. You want submodules to be very much separate 
repostories, although you *can* of course share the object store if you 
want to (the same way git can do it between any number of totally 
unrelated repositories!)

		Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]