Re: git filter-branch should run git gc --auto

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Wed, 23 Jan 2008, Mike Hommey wrote:

> On Tue, Jan 22, 2008 at 06:46:52PM -0800, Junio C Hamano wrote:
> > Kevin Ballard <kevin@xxxxxx> writes:
> > 
> > > I just glanced at git-filter-branch.sh (and I must say I was 
> > > incredibly surprised to find out it was a shell script) and it seems 
> > > it never runs git-gc or git-repack. Doesn't that end up with the 
> > > same problems as git-svn sans git-repack when filtering a large 
> > > number of commits? I was just thinking, if I were to 
> > > git-filter-branch on my massive repo (in fact, the same repo that 
> > > started this thread, with over 33000 commits in the upstream svn 
> > > repo), even if I just do something as simple as change the commit 
> > > msg wont I end up with thousands of unreachable objects? I shudder 
> > > to think how many unreachable objects I would have if I pruned the 
> > > entire dports directory off of the tree.
> > >
> > > Am I missing something, or does git-filter-branch really not do any 
> > > garbage collection? I tried reading the source, but complex bash 
> > > scripts are almost as bad as perl in terms of readability.
> > 
> > Theoretically yes, and it largely depends on what you do, but 
> > filter-branch goes over the objects that already exists in your 
> > repository, and hopefully you won't be rewriting majority of them.
> > 
> > So the impact of not repacking is probably much less painful in 
> > practice.
> > 
> > But again as I said, it largely depends on what you do in your filter.  
> > If you are upcasing (or convert to NFD ;-)) the contents of all of 
> > your blob objects, you would certainly want to repack every once in a 
> > while.
> 
> I wonder if it wouldn't be possible to have filter-branch use 
> fast-import, so that it would create a pack instead of a lot of loose 
> objects.

Not really; the filters are very much tuned to the index-modification and 
commit process.

And I doubt that the gc --auto would help much; git-filter-branch creates 
gazillions of files, and that is likely to bring performance down.  If, 
that is, you choose _not_ to heed the comment in 
Documentation/git-filter-branch.txt lines 44-46:

	Note that since this operation is extensively I/O expensive, it 
	might be a good idea to redirect the temporary directory off-disk 
	with the '-d' option, e.g. on tmpfs.  Reportedly the speedup is 
	very noticeable.

Ciao,
Dscho

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux