Re: Avery Pennarun's git-subtree?

skillzero@xxxxxxxxx · Fri, 23 Jul 2010 17:58:31 -0700

On Fri, Jul 23, 2010 at 3:50 PM, Avery Pennarun <apenwarr@xxxxxxxxx> wrote:

> Honest question: do you care about the wasted disk space and download
> time for these extra files?  Or just the fact that git gets slow when
> you have them?

I have the similar situation to the original poster (huge trees) and
for me it's all three: disk space, download time, and performance. My
tree has a few relatively small (< 20 MB) shared directories of common
code, a few large (2-6 GB) directories of code for OS's, and then
several medium size (< 500 MB) directories for application code. The
application developers only care about the app+shared directories (and
are very annoyed by the massive space and performance impact of the OS
directories). The firmware-only developers only care about OS+shared
and are mildly annoyed by the medium space and performance impact of
the app directories. I work on all of the pieces, but even I would
prefer to have things separated so when I work on the apps, git
status/etc doesn't take a big hit for close to a million files in the
OS directories (particularly when doing git status on Windows). Even
when using the -uno option to git status, it's still pretty slow (over
a minute).

git-submodule might be technically possible in this situation, but
having to commit and push each submodule and then commit and push the
super module makes it slightly worse than just dealing with the
space/download/performance issues of one huge repository.

git-subtree could also possibly help, but there's still extra work to
split and merge each repository. And I'm not sure how it handles
commit IDs across the repositories because I want to be able to say "I
fixed that bug in shared/code.c in commit abc123" and have both the
OS+shared and the apps+shared people be able git log abc123 and see
the same change (and merge/cherry-pick/etc.).

I think what I want is a way to do a sparse checkout where some sort
of module is maintained in the git repository (probably just an
INI-style file with paths) so I can clone directly from the server and
it figures out the objects I need for the full history of only
apps+shared (or firmware+shared, etc.) on the server side and only
sends those objects. I still want to be able to branch, tag, and refer
to commit IDs. So I only take the space/download/performance hit of
directories included in the module, but I don't have to manually
maintain that view of the repository (as I do with git-submodule and
git-subtree).

The closest thing to that so far for me has been the sparse checkout
support added in git 1.7 combined with a convenience script I wrote.
Everyone still has a huge download and .git directory, but at least
the working copy is limited to the paths specified in the module so
git status isn't super slow (although just having all those objects in
the .git directory still slows it down quite a bit).
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html