Geert Bosch <bosch@xxxxxxxx> writes: > Even though the previous version did really well on large files > with many changes, performance was lacking for the many small > files with very few changes that are so common for a VCS. >... > The result has been only a slight increase in delta size for > very large test cases (but with better performance), and > both smaller deltas and faster execution speed for repacking > git.git. I had trouble cloning the Linux kernel repository, > but am now reasonably confident this will outperform the > existing algorithm pretty consistently. Interesting. Initial impression, the same test as before (a full packing of the git.git repository that does not have _any_ pack -- all 18k objects are loose). First, the incumbent, with the "reusing delta-index" patch applied. Total 17724, written 17724 (delta 12002), reused 0 (delta 0) 34.02user 6.48system 0:42.87elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+434478minor)pagefaults 0swaps 6188418 pack-nico-f1fac077a093ffdaf094aab2b7f11859ec0c18f1.pack Then diff-delta.c replaced with your version. Total 17724, written 17724 (delta 12012), reused 0 (delta 0) 44.87user 6.54system 0:54.01elapsed 95%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+441124minor)pagefaults 0swaps 6099183 pack-geert-f1fac077a093ffdaf094aab2b7f11859ec0c18f1.pack Second impression, in a recent kernel tree which is mostly packed. Packing 41k objects (v2.6.16..v2.6.17-rc3), with "git-pack-objects --no-reuse-delta". (Nico) Total 41591, written 41591 (delta 29285), reused 8563 (delta 0) 169.08user 12.60system 3:27.68elapsed 87%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (2major+1099928minor)pagefaults 0swaps 37363966 pack-nico-b9e4339c482cb7d787a2117e6da6eb2114053abc.pack (Geert) Total 41591, written 41591 (delta 29347), reused 8427 (delta 0) 243.71user 12.32system 4:28.11elapsed 95%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+1077843minor)pagefaults 0swaps 37165890 pack-geert-b9e4339c482cb7d787a2117e6da6eb2114053abc.pack Of course, the absolute numbers do not matter, but for the record these are on my Duron 750, 760MB or so RAM and with relatively slow disks. In the kernel repository (checked out is near the tip of the source tree), the largest files are fs/nls/nls_cp949.c (900kB korean character encoding), drivers/usb/misc/emi62_fw_s.h (800kB, Emagic firmware blob), arch/m68k/ifpsp060/src/fpsp.S (750kB, floating point emulation?), and nowhere near your algorithm really should shine. We would probably want some internal logic that says "if we see that blobs larger than X MB is involved in the packing, we should use this version of diff-delta, otherwise the other one." - : send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html