Re: [PATCH] diff-delta: produce optimal pack data

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



OOps.... Forgot to complete one paragraph.

On Sat, 25 Feb 2006, Nicolas Pitre wrote:

> I of course looked at the time to pack vs the size reduction in my 
> tests.  And really like I said above the cost is well balanced.  The 
> only issue is that smaller blocks are more likely to trap into 
> patological data sets.  But that problem does exist with larger blocks 
> too, to a lesser degree of course but still.  For example, using a 16 
> block size with adler32, computing a delta between two files 

... as provided by Carl takes up to _nine_ minutes for a _single_ delta !

So regardless of the block size used, the issue right now has more to do 
with that combinatorial explosion than the actual block size.  And 
preventing that patological case from expending out of bounds is pretty 
easy to do.

OK I just tested a tentative patch to trap that case and the time to 
delta those two 20MB files passed from over 9 minutes to only 36 seconds 
here, with less than 10% in delta size difference.  So I think I might 
be on the right track.  Further tuning might help even further.


Nicolas
-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]