Re: [RFC] Submodules in GIT

"Torgil Svensson" <torgil.svensson@xxxxxxxxx> · Sat, 16 Dec 2006 00:43:56 +0100

On 12/15/06, Josef Weidendorfer <Josef.Weidendorfer@xxxxxx> wrote:
That all sounds fine, but how do you create such symlinks in practice?

I'm very open to suggestions here, but the concept growing in my head
is based around Linus 'module'-file and keep things simple. A git
configuration file that specifies:
* link name for reference
* local path to link
* submodule source
* submodule path to tree/blob
* submodule commit / HEAD / branch
* options (depth-limit , ...)

I'm reconsidering having the path-name in the link, it should be
sufficient to have two SHA1's, one for the commit and one for the
tree/blob. Super-module should have the tree/blob in it's database so
that the link part only is there for version information and reference
(checking dirty state or history on the submodule). This way it easy
to clone the super-project and use it without having to map up all
sub-project sources. Sub-project sources is not important for version
information and could always be specified in the project in a
README-type of file.

Especially, what is the SCM user supposed to do to change the link
target, ie. from
 <commit>/path/to/subtree
to
 <commit>/path2/to2/subtree2
?
Should this do a re-checkout at the other point?

That would be a change in the modules file, maybe through a command
that also fixes the link. The link will have to be updated in the
index and commited as normal.

By linking a file from a submodule, such a link seems to force that
this file has to be at a fixed position in the submodule. Otherwise,
some magic has to happen when the file is moved in the submodule,
possibly leading to a dangling link, eg. if the whole subdirectory
specified in the link is removed.

Since we have the SHA1 (this is what we're using) and tree/blob
information in the super-modules database the change itself is not a
problem. The problem is to track renames/moves and your remove case in
the submodule. The tool that tracks the submodule should probably
warn/exit here and we would fix up the modules file manually.

IMHO this is getting way to complex.

One of complex situation here as I see it is the ability to handle to
track/checkout only a subset (tree/blob) of the submodule. This is
also quite an important feature - in my example it means the
difference of tracking one header file versus the whole source.

If you only want to check out part of a submodule, this should be
done with path-limiting checkouts, which should be a feature totally
independent from submodules.

If we can do path-limiting checkouts on a repo (module) we also can do
it on a sub-module since they are exactly the same. This is a very
powerful feature and it'd be a huge waste if it wasn't allowed for a
super-module to do on submodules.

And if you want to limit the number of objects transferred in cloning
of a subproject, it is better to further split this subproject into
multiple subprojects itself.

What if we have no control of the submodule?  This can be tracked from
upstream, sourceforge, another company, etc. The submodule will often
live their own life and could be X, kernel, gcc, cairo, whatever, ...

The problem is not the representation in the git repository, but the
checked out module/submodule, where you need to use normal UNIX file semantics.
To move submodules around, the user should be able to just use
the normal UNIX "mv" commands, and git should be able to detect move
actions after the fact.

If we disregard the commit info, the link will act exactly as a normal
tree/blob. Git can know we're moving a subproject by watching the
module file. The main problem is to keep modules file up-to-date with
reality. We could enforce module file validity by disallowing such
operations and let the user do a "force" operation which also alters
the modules file.

This now becomes a problem if you use symlinks to "unify" multiple checkouts
of the same submodule at multiple places in the supermodule, and move
the symlink around, as it easily can get dangling this way. Thus, you would
not have a way to see what submodule this link was talking about.

The symlink only exists in the modules file. We only have the SHA1's
at the tree-level and there we have everything underneath the
tree/blob SHA1 in our database. We will only know if the modules
symlink file is dangling next time we fetch from the submodule - here
we would notify the user but our database is still consistent.

If you have a source commit chain A => B => C => D, you want
to make any build commits totally independent: you first only
are interested in a build commit for source versions A and D,
and later find out that a build commit for B and C would be nice,
too. If you force build commits into some history order, this
order now would be A => D => B => C, which makes no sense.

It makes no sense because the user seem to have act irrationally. The
commit-chain is completely valid as it has tracked the correct history
of the builds. I can't see any problems here, the build-project is
independent of the source-project with it's own history. We can hope
the user has given good explanations for his/her actions in the commit
messages though.

//Torgil
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html