Nicolas Pitre wrote: > Martin Koegler noted that create_delta() performs a new hash lookup > after every block copy encoding which are currently limited to 64KB. > > In case of larger identical blocks, the next hash lookup would normally > point to the next 64KB block in the reference buffer and multiple block > copy operations will be consecutively encoded. > > It is however possible that the reference buffer be sparsely indexed if > hash buckets have been trimmed down in create_delta_index() when hashing > of the reference buffer isn't well balanced. In that case the hash > lookup following a block copy might fail to match anything and the fact > that the reference buffer still matches beyond the previous 64KB block > will be missed. > > Let's rework the code so that buffer comparison isn't bounded to 64KB > anymore. The match size should be as large as possible up front and > only then should multiple block copy be encoded to cover it all. > Also, fewer hash lookups will be performed in the end. > > According to Martin, this patch should reduce his 92MB pack down to 75MB > with the dataset he has. > > Tests performed on the Linux kernel repo show a slightly smaller pack and > a slightly faster repack. > Acked-by: Martin Koegler <mkoegler@xxxxxxxxxxxxxxxxx> > Signed-off-by: Nicolas Pitre <nico@xxxxxxx> --- The patch results in a 75 MB pack file for my repository and is faster: Total 6452 (delta 4581), reused 1522 (delta 0) 10073.11user 5200.33system 4:14:36elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+1371504760minor)pagefaults 0swaps mfg Martin Kögler - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html