On 12/15/06, Josef Weidendorfer <Josef.Weidendorfer@xxxxxx> wrote:
That all sounds fine, but how do you create such symlinks in practice?
I'm very open to suggestions here, but the concept growing in my head is based around Linus 'module'-file and keep things simple. A git configuration file that specifies: * link name for reference * local path to link * submodule source * submodule path to tree/blob * submodule commit / HEAD / branch * options (depth-limit , ...) I'm reconsidering having the path-name in the link, it should be sufficient to have two SHA1's, one for the commit and one for the tree/blob. Super-module should have the tree/blob in it's database so that the link part only is there for version information and reference (checking dirty state or history on the submodule). This way it easy to clone the super-project and use it without having to map up all sub-project sources. Sub-project sources is not important for version information and could always be specified in the project in a README-type of file.
Especially, what is the SCM user supposed to do to change the link target, ie. from <commit>/path/to/subtree to <commit>/path2/to2/subtree2 ? Should this do a re-checkout at the other point?
That would be a change in the modules file, maybe through a command that also fixes the link. The link will have to be updated in the index and commited as normal.
By linking a file from a submodule, such a link seems to force that this file has to be at a fixed position in the submodule. Otherwise, some magic has to happen when the file is moved in the submodule, possibly leading to a dangling link, eg. if the whole subdirectory specified in the link is removed.
Since we have the SHA1 (this is what we're using) and tree/blob information in the super-modules database the change itself is not a problem. The problem is to track renames/moves and your remove case in the submodule. The tool that tracks the submodule should probably warn/exit here and we would fix up the modules file manually.
IMHO this is getting way to complex.
One of complex situation here as I see it is the ability to handle to track/checkout only a subset (tree/blob) of the submodule. This is also quite an important feature - in my example it means the difference of tracking one header file versus the whole source.
If you only want to check out part of a submodule, this should be done with path-limiting checkouts, which should be a feature totally independent from submodules.
If we can do path-limiting checkouts on a repo (module) we also can do it on a sub-module since they are exactly the same. This is a very powerful feature and it'd be a huge waste if it wasn't allowed for a super-module to do on submodules.
And if you want to limit the number of objects transferred in cloning of a subproject, it is better to further split this subproject into multiple subprojects itself.
What if we have no control of the submodule? This can be tracked from upstream, sourceforge, another company, etc. The submodule will often live their own life and could be X, kernel, gcc, cairo, whatever, ...
The problem is not the representation in the git repository, but the checked out module/submodule, where you need to use normal UNIX file semantics. To move submodules around, the user should be able to just use the normal UNIX "mv" commands, and git should be able to detect move actions after the fact.
If we disregard the commit info, the link will act exactly as a normal tree/blob. Git can know we're moving a subproject by watching the module file. The main problem is to keep modules file up-to-date with reality. We could enforce module file validity by disallowing such operations and let the user do a "force" operation which also alters the modules file.
This now becomes a problem if you use symlinks to "unify" multiple checkouts of the same submodule at multiple places in the supermodule, and move the symlink around, as it easily can get dangling this way. Thus, you would not have a way to see what submodule this link was talking about.
The symlink only exists in the modules file. We only have the SHA1's at the tree-level and there we have everything underneath the tree/blob SHA1 in our database. We will only know if the modules symlink file is dangling next time we fetch from the submodule - here we would notify the user but our database is still consistent.
If you have a source commit chain A => B => C => D, you want to make any build commits totally independent: you first only are interested in a build commit for source versions A and D, and later find out that a build commit for B and C would be nice, too. If you force build commits into some history order, this order now would be A => D => B => C, which makes no sense.
It makes no sense because the user seem to have act irrationally. The commit-chain is completely valid as it has tracked the correct history of the builds. I can't see any problems here, the build-project is independent of the source-project with it's own history. We can hope the user has given good explanations for his/her actions in the commit messages though. //Torgil - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html