Nicolas Pitre <nico@xxxxxxx> writes: > Indexing based on adler32 has a match precision based on the block size > (currently 16). Lowering the block size would produce smaller deltas > but the indexing memory and computing cost increases significantly. Indeed. I had this patch in my personal tree for a while. I was wondring why sometimes progress indication during "Deltifying" stage stops for literally several seconds, or more. In Linux 2.6 repository, these object pairs take forever to delta. blob 9af06ba723df75fed49f7ccae5b6c9c34bc5115f -> blob dfc9cd58dc065d17030d875d3fea6e7862ede143 size (491102 -> 496045) 58 seconds blob 4917ec509720a42846d513addc11cbd25e0e3c4f -> blob dfc9cd58dc065d17030d875d3fea6e7862ede143 size (495831 -> 496045) 64 seconds Admittedly, these are *BAD* input samples (a binary firmware blob with many similar looking ", 0x" sequences). I can see that trying to reuse source materials really hard would take significant computation. However, this is simply unacceptable. The new algoritm takes 58 seconds to produce 136000 bytes of delta, while the old takes 0.25 seconds to produce 248899 (I am using the test-delta program in git.git distribution). The compression ratio is significantly better, but this is unusable even for offline archival use (remember, pack delta selection needs to do window=10 such deltification trials to come up with the best delta, so you are spending 10 minutes to save 100k from one oddball blob), let alone on-the-fly pack generation for network transfer. Maybe we would want two implementation next to each other, and internally see if it is taking too much cycles compared to the input size then switch to cheaper version? - : send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html