Asger Ottar Alstrup <asger@xxxxxxxx> writes: > On Mon, May 25, 2009 at 7:54 PM, Avery Pennarun <apenwarr@xxxxxxxxx> wrote: >> On Mon, May 25, 2009 at 1:35 PM, Asger Ottar Alstrup <asger@xxxxxxxx> wrote: >>> So a poor mans system could work like this: >>> >>> - A reduced repository is defined by a list of paths in a file, I >>> guess with a format similar to .gitignore >> >> Are you sure you want to define the list with exclusions instead of >> inclusions? I don't really know your use case. > > Since the .gitignore format supports !, I believe that should not make > much of a difference. > >> Anyway, if you're using git filter-branch, it'll be up to you to fix >> the index to contain the list of files you want. (See man >> git-filter-branch) > > Yes, sure, and that is why I asked whether there is some tool in git > that can give a list of concrete files surviving a .gitignore list of > patterns. I think you would want to use git-ls-files, using --exclude-from=<file> option, and perhaps also -i/--ignored to create list of files to be removed (using git-update-index) instead of list of files to be kept. >>> - To extract: A copy of the original repository is made. This copy is >>> reduced using git filter-branch. Is there some way of turning a >>> .gitignore syntax file into a concrete list of files? Also, can this >>> entire step be done in one step without the copy? Having to copy the >>> entire project first seems excessive. Will filter-branch preserve >>> and/or prune pack files intelligently? >> >> You probably need to read about the differences between git trees, >> blobs, and commits. You're not actually "copying" anything; you're >> just creating some new directory structures that contain the >> *existing* blobs. And of course the existing blobs are in your >> existing packs. > > Thanks. OK, I see now that filter-branch will not destroy the original > repository. That is not at all obvious from reading the man page, when > the very first sentence says that it will rewrite history. What git-filter-branch does is to write _new_ history, and move old history to refs/original/* namespace (that might have changed; anyway the old history should be available via reflog). The visible efect is that history got rewritten. > But the > main point of this exercise is to reduce the size of the reduced > repository so that it can be transferred effectively. So after > filter-branch, I guess I would run clone afterwards to make the new, > smaller repository, and then the question becomes: Will clone reuse > and prune packs intelligently? Yes, it would... well, you have to take into account that ordinary clone over local filesystem does hardlinking of packfiles, and you need to use file:// trick to force repack; also you might want to use --reference to set up alternates. But that is not necessary: if you want to push effectively _subset_ of branches, you can define remote infor in appropriate way and push would intelligently transfer only needed objects. [...] > However, there is a large group of users that do not need this, but > they DO need the entire history of the files they are interested in. > Subversion does not provide this. Also, Subversion is simply too slow > to handle the kind of files we need to work with. Also, we have run > tests on the kind of files we have, and the delta compression that git > uses is very effective for compression the pdf and openoffice > documents we use. The big files we have are primarily image files, and > obviously they do not compress very well. Fortunately, they do not > change much either. You might want to turn off deltaification for binary files via `delta` gitattribute; it might help (it might not). -- Jakub Narebski Poland ShadeHawk on #git -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html