Hi, On Wed, 23 Jan 2008, Mike Hommey wrote: > On Tue, Jan 22, 2008 at 06:46:52PM -0800, Junio C Hamano wrote: > > Kevin Ballard <kevin@xxxxxx> writes: > > > > > I just glanced at git-filter-branch.sh (and I must say I was > > > incredibly surprised to find out it was a shell script) and it seems > > > it never runs git-gc or git-repack. Doesn't that end up with the > > > same problems as git-svn sans git-repack when filtering a large > > > number of commits? I was just thinking, if I were to > > > git-filter-branch on my massive repo (in fact, the same repo that > > > started this thread, with over 33000 commits in the upstream svn > > > repo), even if I just do something as simple as change the commit > > > msg wont I end up with thousands of unreachable objects? I shudder > > > to think how many unreachable objects I would have if I pruned the > > > entire dports directory off of the tree. > > > > > > Am I missing something, or does git-filter-branch really not do any > > > garbage collection? I tried reading the source, but complex bash > > > scripts are almost as bad as perl in terms of readability. > > > > Theoretically yes, and it largely depends on what you do, but > > filter-branch goes over the objects that already exists in your > > repository, and hopefully you won't be rewriting majority of them. > > > > So the impact of not repacking is probably much less painful in > > practice. > > > > But again as I said, it largely depends on what you do in your filter. > > If you are upcasing (or convert to NFD ;-)) the contents of all of > > your blob objects, you would certainly want to repack every once in a > > while. > > I wonder if it wouldn't be possible to have filter-branch use > fast-import, so that it would create a pack instead of a lot of loose > objects. Not really; the filters are very much tuned to the index-modification and commit process. And I doubt that the gc --auto would help much; git-filter-branch creates gazillions of files, and that is likely to bring performance down. If, that is, you choose _not_ to heed the comment in Documentation/git-filter-branch.txt lines 44-46: Note that since this operation is extensively I/O expensive, it might be a good idea to redirect the temporary directory off-disk with the '-d' option, e.g. on tmpfs. Reportedly the speedup is very noticeable. Ciao, Dscho - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html