Hi, On Mon, 4 Jan 2010, Jens Lehmann wrote: > Am 04.01.2010 10:44, schrieb Johannes Schindelin: > > The real problem is that submodules in the current form are not very > > well designed. > > IMVHO using the tree sha1 for a submodule seems to be the 'natural' way > to include another git repo. And it gives the reproducibility i expect > from a scm. Or am i missing something? You do remember the discussion at the Alles wird Git about the need for Subversion external-like behavior, right? > It looks to me as most shortcomings come from the fact that most git > commands tend to ignore submodules (and if they don't, like git gui and > gitk do now, they e.g. only show certain aspects of their state). It is not only ignoring. It is not being able to cope with the state only submodules can be in (see below). > Submodules are in heavy use in our company since last year. Virtually > every patch i submitted for submodules came from that experience and > scratched an itch i or one of my colleagues had (and the situation did > already improve noticeably by the few things we changed). We are still > convinced that using submodules was the right decision. But some work > has still to be done to be able to use them easily and to get rid of > some pitfalls. Submodules may be the best way you have in Git for your workflow ATM. But that does not mean that the submodule design is in any way thought-through. Just a few shortcomings that do show up in my main project (and to a small extent in msysGit, as you are probably aware): - submodules were designed with a strong emphasis on not being forced to check them out. But Git makes it very unconvenient to actually check submodules out, let alone check them out at clone-time. And it is outright impossible to _enforce_ a submodule to be checked out. - among other use cases, submodules are recommended for sharing content between two different repositories. But it is part of the design that it is _very_ easy to forget to commit, or push the changes in the submodule that are required for the integrity of the superproject. - that use case -- sharing content between different repositories -- is not really supported by submodules, but rather an afterthought. This is all too obvious when you look at the restriction that the shared content must be in a single subdirectory. - submodules would be a perfect way to provide a fast-forward-only media subdirectory that is written to by different people (artists) than to the superproject (developers). But there is no mechanism to enforce shallow fetches, which means that this use case cannot be handled efficiently using Git. - related are the use cases where it is desired not to have a fixed submodule tip committed to the superproject, but always to update to the current, say, master (like Subversion's externals). This use case has been wished away by the people who implemented submodules in Git. But reality has this nasty habit of ignoring your wishes, does it not? - there have been patches supporting rebasing submodules, i.e. submodules where a "git submodule update" rebases the current branch to the revision committed to the superproject rather than detaching the HEAD, which everybody who ever contributed to a project with submodules should agree is a useful thing. But the patches only have been discussed to death, to the point where the discussion's information content was converging to zero, yet the patches did not make it into Git. (FWIW this is one reason why I refuse to write patches to git-submodule.sh: I refuse to let my time to be wasted like that.) - working directories with GIT_DIRs are a very different beast from single files. That alone leads to a _lot_ of problems. The original design of Git had only a couple of states for named content (AKA files): clean, added, removed, modified. The states that are possible with submodules are for the most part not handled _at all_ by most Git commands (and it is sometimes very hard to decide what would be the best way to handle those states, either). Just think of a submodule at a different revision than committed in the superproject, with uncommitted changes, ignored and unignored files, a few custom hooks, a bit of additional metadata in the .git/config, and just for fun, a few temporary files in .git/ which are used by the hooks. - while it might be called clever that the submodules' metadata are stored in .gitmodules in the superproject (and are therefore naturally tracked with Git), the synchronization with .git/config is performed exactly once -- when you initialize the submodule. You are likely to miss out on _every_ change you pulled into the superproject. All in all, submodules are very clumsy to work with, and you are literally forced to provide scripts in the superproject to actually work with the submodules. > > In ths short run, we can paper over the shortcomings of the submodules > > by introducing a command line option "--include-submodules" to > > update-refresh, diff-files and diff-index, though. > > Maybe this is the way to go for now (and hopefully we can turn this > option on by default later because we did the right thing ;-). I do not think that --include-submodules is a good default. It is just too expensive in terms of I/O even to check the status in a superproject with a lot of submodules. Besides, as long as there is enough reason to have out-of-Git alternative solutions such as repo, submodules deserve to be 2nd-class citizens. Ciao, Dscho -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html