Re: git subtree as a solution to partial cloning?

Jakub Narebski <jnareb@xxxxxxxxx> · Mon, 25 May 2009 16:26:00 -0700 (PDT)

Asger Ottar Alstrup <asger@xxxxxxxx> writes:

> On Mon, May 25, 2009 at 7:54 PM, Avery Pennarun <apenwarr@xxxxxxxxx> wrote:
>> On Mon, May 25, 2009 at 1:35 PM, Asger Ottar Alstrup <asger@xxxxxxxx> wrote:
>>> So a poor mans system could work like this:
>>>
>>> - A reduced repository is defined by a list of paths in a file, I
>>> guess with a format similar to .gitignore
>>
>> Are you sure you want to define the list with exclusions instead of
>> inclusions?  I don't really know your use case.
> 
> Since the .gitignore format supports !, I believe that should not make
> much of a difference.
> 
>> Anyway, if you're using git filter-branch, it'll be up to you to fix
>> the index to contain the list of files you want. (See man
>> git-filter-branch)
> 
> Yes, sure, and that is why I asked whether there is some tool in git
> that can give a list of concrete files surviving a .gitignore list of
> patterns.

I think you would want to use git-ls-files, using --exclude-from=<file>
option, and perhaps also -i/--ignored to create list of files to be
removed (using git-update-index) instead of list of files to be kept.

>>> - To extract: A copy of the original repository is made. This copy is
>>> reduced using git filter-branch. Is there some way of turning a
>>> .gitignore syntax file into a concrete list of files? Also, can this
>>> entire step be done in one step without the copy? Having to copy the
>>> entire project first seems excessive. Will filter-branch preserve
>>> and/or prune pack files intelligently?
>>
>> You probably need to read about the differences between git trees,
>> blobs, and commits.  You're not actually "copying" anything; you're
>> just creating some new directory structures that contain the
>> *existing* blobs.  And of course the existing blobs are in your
>> existing packs.
> 
> Thanks. OK, I see now that filter-branch will not destroy the original
> repository. That is not at all obvious from reading the man page, when
> the very first sentence says that it will rewrite history. 

What git-filter-branch does is to write _new_ history, and move old
history to refs/original/* namespace (that might have changed; anyway
the old history should be available via reflog).  The visible efect
is that history got rewritten.

> But the
> main point of this exercise is to reduce the size of the reduced
> repository so that it can be transferred effectively. So after
> filter-branch, I guess I would run clone afterwards to make the new,
> smaller repository, and then the question becomes: Will clone reuse
> and prune packs intelligently?

Yes, it would... well, you have to take into account that ordinary
clone over local filesystem does hardlinking of packfiles, and you
need to use file:// trick to force repack; also you might want to use
--reference to set up alternates.

But that is not necessary: if you want to push effectively _subset_
of branches, you can define remote infor in appropriate way and push
would intelligently transfer only needed objects.

[...]
> However, there is a large group of users that do not need this, but
> they DO need the entire history of the files they are interested in.
> Subversion does not provide this. Also, Subversion is simply too slow
> to handle the kind of files we need to work with. Also, we have run
> tests on the kind of files we have, and the delta compression that git
> uses is very effective for compression the pdf and openoffice
> documents we use. The big files we have are primarily image files, and
> obviously they do not compress very well. Fortunately, they do not
> change much either.

You might want to turn off deltaification for binary files via `delta`
gitattribute; it might help (it might not).

-- 
Jakub Narebski
Poland
ShadeHawk on #git
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html