Re: [RFC] Submodules in GIT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Sat, 2 Dec 2006, Josef Weidendorfer wrote:
> 
> So you are for a global submodule namespace in supermodule repositories,
> do I understand correctly?
> 
> Otherwise, how would you specify the submodules at clone time given the
> ability that submodule roots can have relative path changed arbitrarily
> between commits?

The only _true_ namespace would be the SHA1 of the commit (and maybe allow 
a pointer to a tag too, but the namespace ends up being the same).

How to _find_ a repository that contains that SHA1 must be left to higher 
levels. After all, repositories move around, and the place you found them 
originally is not a stable name.

So within the supermodule, on a "git object" level, a submodule should 
just be named by the SHA1 that was it's HEAD when it was committed within 
the supermodule. So in the "tree object", you'd see something like the 
following when you go "git ls-tree HEAD" on the superproject:

	...
	100644 blob 08602f522183dc43787616f37cba9b8af4e3dade	xdiff-interface.c
	100644 blob 1346908bea31319aabeabdfd955e2ea9aab37456	xdiff-interface.h
	040000 tree 959dd5d97e665998eb26c764d3a889ae7903d9c2	xdiff
	050000 link 0215ffb08ce99e2bb59eca114a99499a4d06e704	xyzzy

where that 050000 is the new magic type (I picked one out of my *ss: it's 
not a valid type for a file mode, so it's a godo choice, but it could be 
anythign that cannot conflict with a real file), which just specifies the 
"link" part. The SHA1 is the SHA1 of the commit, and the "xyzzy" is 
obviously just the name within the directory of the submodule.

That's all that is actually required for a lot of git commands that 
already expect all objects to be available (ie "git checkout", "git diff" 
etc).

It only gets interesting for commands that fetch new objects, ie do a 
"pull/fetch" op, and you'd need to know where/how to fetch new objects for 
the xyzzy subproject, so that's a "naming" issue. You have a few choices:

 - get all the objects directly from the subproject as if it was one big 
   project.

   I actually think this sucks. Why? Because it puts an insane load on the 
   server side, which basically needs to traverse the object list of the 
   _sum_ of all projects. An initial clone (or a really big pull, which 
   comes to the same thing) would be absolutely horrendous

So I'd strongly argue against that approach, for scalability reasons. So 
instead, you should really try to do pulls etc one git repo at a time:

 - take the "list of subprojects" from the supermodule, and pull them all 
   one by one.

   This again makes subprojects "less seamless", and makes each subproject 
   more of a separate thing, with the project list gotten from the 
   superproject and parsed separately. But it means you have none of the 
   scalability problems, since you never see things as one huge project 
   with millions of files and even more objects.

The second approach also means that you can see the "supermodule" support 
in git as less of a "plumbing" thing, and it's largely just a thin veneer 
around the core plumbing that really doesn't understand about multiple 
repositories at all (apart from the single "link" extension in the tree 
object), and it's really just scripting to get the subprojects to "look" 
like one thing, when they really are pretty much independent.

		Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]