Re: [PATCH] consider previous pack undeltified object state only when reusing delta data

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 30 Jun 2006, Andreas Ericsson wrote:

> Johannes Schindelin wrote:
> > Hi,
> > 
> > On Thu, 29 Jun 2006, Nicolas Pitre wrote:
> > 
> > 
> > > Without this there would never be a chance to improve packing for
> > > previously undeltified objects.
> > 
> > 
> > Earlier this year, I was quite surprised to learn that multiple repackings
> > actually improved packing. Does that patch mean this feature is gone?
> > 
> 
> The patch Linus sent removes that feature. This one re-introduces it.

Not really.

Actually that multiple repacking "feature" was rather an artifact of the 
delta data reuse code and not really by design.  Here's what happened 
before:

Consider the first repack where no delta exists, or "git-repack -a -f" 
where the -f argument makes it ignores existing delta data.  In that 
case all objects are sorted and delta attempted on them within a window.

So to simplify things let's assume objects are numbered from 1 upwards.  
First obj #1 is added to the window.  Obj #2 attempts a delta against 
obj #1.  Obj #3 attempts a delta against objs #2 and #1.  Obj #4 
attempts a delta against objs #3, #2 and #1.  And so on for all object: 
each new object attempts a delta against the last 10 objects (the 
default window size is 10) and the best delta, if any, is kept.

In the end, some objects get deltified, some don't, and a new pack is 
produced.

When repacking without -f to git-repack, then already deltified objects 
are simply copied as is from the existing pack(s) avoiding costly delta 
re-computation.  Still, without Linus' patch, non-deltified objects were 
considered for deltification and deltas attempted on them.

So supposing that objects #1 through #10 were not deltified, and objects 
#11 through #50 were deltified, then those deltified objects were 
skipped over for the purpose of delta matching and therefore object #51 
ended up attempting a delta against objs #1 to 10 instead of #41 to #50 
like in the previous run.  The net effect was similar to a larger window 
for some objects providing more opportunities for successful deltas, and 
therefore a smaller pack.

With Linus' patch those objects already known to be undeltified are, 
too, skipped.  That means that successive git-repack without the -f 
argument are now producing identical packs all the time and the artifact 
above is gone.

I think this is a good thing since now the packing behavior is more 
predictable.  But nothing is lost since if you want to have better 
packing like before you simply have to specify a slightly larger window 
size on the first git-repack.  It'll take a bit more time but running 
git-repack many times also took more time in the end anyway.


Nicolas
-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]