OOps.... Forgot to complete one paragraph. On Sat, 25 Feb 2006, Nicolas Pitre wrote: > I of course looked at the time to pack vs the size reduction in my > tests. And really like I said above the cost is well balanced. The > only issue is that smaller blocks are more likely to trap into > patological data sets. But that problem does exist with larger blocks > too, to a lesser degree of course but still. For example, using a 16 > block size with adler32, computing a delta between two files ... as provided by Carl takes up to _nine_ minutes for a _single_ delta ! So regardless of the block size used, the issue right now has more to do with that combinatorial explosion than the actual block size. And preventing that patological case from expending out of bounds is pretty easy to do. OK I just tested a tentative patch to trap that case and the time to delta those two 20MB files passed from over 9 minutes to only 36 seconds here, with less than 10% in delta size difference. So I think I might be on the right track. Further tuning might help even further. Nicolas - : send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html