Re: 1.3.0 creating bigger packs than 1.2.3

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Linus Torvalds <torvalds@xxxxxxxx> wrote:
> 
> 
> On Thu, 20 Apr 2006, Shawn Pearce wrote:
> > 
> > So with 1.3.0.g56c1 "git repack -a -d -f" did worse:
> > 
> >   Total 46391, written 46391 (delta 6649), reused 39742 (delta 0)
> >   129M pack-7f766f5af5547554bacb28c0294bd562589dc5e7.pack
> > 
> > I just tried -f on v1.2.3 and it did slightly better then before:
> > 
> >   Total 46391, written 46391 (delta 6847), reused 38012 (delta 0)
> >    59M pack-7f766f5af5547554bacb28c0294bd562589dc5e7.pack

Oddly enough repacking the v1.2.3 pack using 1.3.0.g56c1 created an
even smaller pack ("git-repack -a -d"):

  Total 46391, written 46391 (delta 8253), reused 44985 (delta 6847)
   49M pack-7f766f5af5547554bacb28c0294bd562589dc5e7.pack

and repacking again with "git-repack -a -d" chopped another 1M:

  Total 46391, written 46391 (delta 8258), reused 46386 (delta 8253)
   48M pack-7f766f5af5547554bacb28c0294bd562589dc5e7.pac
  
but then adding -f definately gives us the 2x explosion again:

  Total 46391, written 46391 (delta 6649), reused 37894 (delta 0)
  129M pack-7f766f5af5547554bacb28c0294bd562589dc5e7.pack

> Interesting. The bigger packs do generate fewer deltas, but they don't 
> seem to be _that_ much fewer. And the deltas themselves certainly 
> shouldn't be bigger.
> 
> It almost sounds like there's a problem with choosing what to delta 
> against, not necessarily a delta algorithm problem. Although that sounds a 
> bit strange, because I wouldn't have thought we actually changed the 
> packing algorithm noticeably since 1.2.3.
> 
> Hmm. Doing "gitk v1.2.3.. -- pack-objects.c" shows that I was wrong. Junio 
> did the "hash basename and direname a bit differently" thing, which would 
> appear to change the "find objects to delta against" a lot. That could be 
> it. 
> 
> You could try to revert that change:
> 
> 	git revert eeef7135fed9b8784627c4c96e125241c06c65e1
> 
> which needs a trivial manual fixup (remove the conflict entirely: 
> everything between the "<<<<" and ">>>>>" lines should go), and see if 
> that's it.

Whoa.  I did that revert and fixup on top of 'next'.  The pack
from "git-repack -a -d -f" is now even larger due to even less
delta reuse:

  Total 46391, written 46391 (delta 5148), reused 39565 (delta 0)
  171M pack-7f766f5af5547554bacb28c0294bd562589dc5e7.pack

> You can also try to see if
> 
> 	git repack -a -d -f --window=50
> 
> makes for a better pack (at the cost of a much slower repack). It makes 
> git try more objects to delta against, and can thus hide a bad sort order.

With --window=50 on 'next' (without the revert'):

  Total 46391, written 46391 (delta 6666), reused 39723 (delta 0)
  129M pack-7f766f5af5547554bacb28c0294bd562589dc5e7.pack

For added measure I tried --window=100 and 500 with pretty much
the same result (slightly higher delta but still a 129M pack).

-- 
Shawn.
-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]