Re: [PATCH] diff-delta: produce optimal pack data

Nicolas Pitre <nico@xxxxxxx> · Fri, 24 Feb 2006 10:37:46 -0500 (EST)

On Fri, 24 Feb 2006, Junio C Hamano wrote:

> Nicolas Pitre <nico@xxxxxxx> writes:
> 
> > Indexing based on adler32 has a match precision based on the block size 
> > (currently 16).  Lowering the block size would produce smaller deltas 
> > but the indexing memory and computing cost increases significantly.
> 
> Indeed.
> 
> I had this patch in my personal tree for a while.  I was
> wondring why sometimes progress indication during "Deltifying"
> stage stops for literally several seconds, or more.

Note that above I'm saying that _keeping_ adler32 for small blocks is 
even longer.  In other words, for small blocks, the version not using 
adler32 is about 3 times faster.  

I also noticed the significant slowdown after I made the 
improved progress patch. The idea now has to do with detecting 
patological cases and breaking out of them early.

> In Linux 2.6 repository, these object pairs take forever to
> delta.
> 
>         blob 9af06ba723df75fed49f7ccae5b6c9c34bc5115f -> 
>         blob dfc9cd58dc065d17030d875d3fea6e7862ede143
>         size (491102 -> 496045)
>         58 seconds
> 
>         blob 4917ec509720a42846d513addc11cbd25e0e3c4f -> 
>         blob dfc9cd58dc065d17030d875d3fea6e7862ede143
>         size (495831 -> 496045)
>         64 seconds

Thanks for this.  I'll see what I can do to tweak the code to better 
cope with those.  Just keep my fourth delta patch in the pu branch for 
now.

Nicolas
-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html