Re: submodules' shortcomings, was Re: RFC: display dirty submodule working directory in git gui and gitk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 4, 2010 at 5:29 PM, Johannes Schindelin
<Johannes.Schindelin@xxxxxx> wrote:
> On Mon, 4 Jan 2010, Jens Lehmann wrote:
>> IMVHO using the tree sha1 for a submodule seems to be the 'natural' way
>> to include another git repo. And it gives the reproducibility i expect
>> from a scm. Or am i missing something?
>
> You do remember the discussion at the Alles wird Git about the need for
> Subversion external-like behavior, right?

I'm not sure why this is such an issue.  Basically, non-version-locked
submodules are about the easiest thing in the world; that's why CVS
and SVN supported them first.  (SVN later added version-locking like
git has.)

All you need is a .gitignore entry and a trivial script that checks
out the external.  If you want to be fancy, this operation could be
part of git, but it's such a totally different case (and an easy one,
no less) that I think it ought to be treated totally seperately.

> - among other use cases, submodules are recommended for sharing content
>  between two different repositories. But it is part of the design that it
>  is _very_ easy to forget to commit, or push the changes in the submodule
>  that are required for the integrity of the superproject.
[...]
> - working directories with GIT_DIRs are a very different beast from single
>  files.  That alone leads to a _lot_ of problems.  The original design of
>  Git had only a couple of states for named content (AKA files): clean,
>  added, removed, modified.  The states that are possible with submodules
>  are for the most part not handled _at all_ by most Git commands (and it
>  is sometimes very hard to decide what would be the best way to handle
>  those states, either).  Just think of a submodule at a different
>  revision than committed in the superproject, with uncommitted changes,
>  ignored and unignored files, a few custom hooks, a bit of additional
>  metadata in the .git/config, and just for fun, a few temporary files in
>  .git/ which are used by the hooks.


I think this is primarily because checked-out submodules currently
have their own .git directories (with their own config, index, etc).
If they were considered *part* of the subproject's repo checkout, and
updated upon switching branches, etc, this whole class of problems
would go away.

> - that use case -- sharing content between different repositories -- is
>  not really supported by submodules, but rather an afterthought.  This is
>  all too obvious when you look at the restriction that the shared content
>  must be in a single subdirectory.

I haven't found the subdir requirement to be much of an issue, at
least on Unix where I can simply work around it using symlinks from
the superproject into the subproject.  It's obviously more gross on
Windows, but I've worked around it there too.  This one isn't a daily
aggravation for me, though maybe it is for others.  And any cure I can
think of sounds rather worse than the disease.

> - submodules would be a perfect way to provide a fast-forward-only media
>  subdirectory that is written to by different people (artists) than to
>  the superproject (developers).  But there is no mechanism to enforce
>  shallow fetches, which means that this use case cannot be handled
>  efficiently using Git.

I doubt you want to "enforce" shallow fetches.  And if you just want
to "allow" shallow fetches, or default to shallow fetches, I'd think
it would be pretty easy to add.  This hasn't been important to me
either.  (It seems to be not too important to git users in general, or
git's support *in general* for shallow repositories would be more
featureful.)

> - while it might be called clever that the submodules' metadata are stored
>  in .gitmodules in the superproject (and are therefore naturally tracked
>  with Git), the synchronization with .git/config is performed exactly
>  once -- when you initialize the submodule.  You are likely to miss out
>  on _every_ change you pulled into the superproject.

This could be fixed too, though I gave up on git-submodule before I
bothered to fix it myself.

The correct solution here is simply to not ever copy the settings from
.gitmodules into .git/config.  Instead, git-submodule should read
.gitmodules as defaults, and then override those defaults with
anything in .git/config.  99% of users will probably not need to ever
put any of their settings in .git/config, and so this problem
disappears.

> All in all, submodules are very clumsy to work with, and you are literally
> forced to provide scripts in the superproject to actually work with the
> submodules.

Agreed; I do this in every project which uses git-submodule.  (And
from doing so, I learned that the value-added of git-submodule is
nearly zero.  My script does most of the work, and it could just as
easily check out the submodule as a git repo too.  I could even choose
to version-lock or not version-lock the checked-out submodule: just
hardcode the commitid into my script!)

> I do not think that --include-submodules is a good default.  It is just
> too expensive in terms of I/O even to check the status in a superproject
> with a lot of submodules.

I've thought about this a lot, and I think having a special case for
submodules here is the wrong line of thinking.  A big project
*without* submodules has this same problem.  The "real" solution is to
just make status checks faster.

(This is actually possible to do: in the extreme case, you just have a
daemon running with inotify or the Windows equivalent.  TortoiseSvn
reputedly does something like this.  I've thought of writing such a
daemon myself to just twiddle --assume-{un,}changed flags at the right
times, particularly since status checks in Windows are so ridiculously
slow.  But I got frustrated when it was *still* slow even after
setting --assume-unchanged on all the files in the index.  git still
scans directories to detect *unknown* files, and there seems to be no
way to turn it off or, moreover, to provide the list of unknown files
from some other source.)

Have fun,

Avery
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]