Junio C Hamano <junkio@xxxxxxx> writes: > In the kernel repository (checked out is near the tip of the > source tree), the largest files are fs/nls/nls_cp949.c (900kB > korean character encoding), drivers/usb/misc/emi62_fw_s.h > (800kB, Emagic firmware blob), arch/m68k/ifpsp060/src/fpsp.S > (750kB, floating point emulation?), and nowhere near your > algorithm really should shine. > > We would probably want some internal logic that says "if we see > that blobs larger than X MB is involved in the packing, we > should use this version of diff-delta, otherwise the other one." Third impression, synthetic workload. A sequence of single file project, the file is tarball of git.git tree (that is, "git-tar-tree vX.Y.Z >tarball"), 120 objects or so (1 commit per rev, 1 tree to hold 1 blob). The (uncompressed) size of the 40 blobs in the pack are between 2.06MB - 2.86MB (average 2.30MB). (Nico) Total 123, written 123 (delta 38), reused 0 (delta 0) 67.26user 1.03system 1:08.76elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+136066minor)pagefaults 0swaps 1822079 pack-nico-26989d516c62197592d0d52db24dfc6a58b633eb.pack (Geert) Total 123, written 123 (delta 38), reused 0 (delta 0) 67.23user 1.35system 1:09.25elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+164124minor)pagefaults 0swaps 1683139 pack-geert-26989d516c62197592d0d52db24dfc6a58b633eb.pack That's an 8% improvement in the same time, which is quite impressive. But I am _very_ unhappy about this particular synthetic workload. I wonder if there are projects with many large blobs that is updated often, so that we can use it as a yardstick. Maybe Wine people have icons, background images and sounds perhaps? But I suspect you would not update them that often. Thinking about it, it does not make much sense, at least to me, to store large tarballs or binary blobs or whatnot in a SCM (we are _not_ in the archival business) and keeping track of their changes. The tarball is out of question -- it is not a source (in GPL sense of the word -- it is not a preferred way to make modification; you modify constituent files and bundle up the result as a new tarball). Graphics images, perhaps. - : send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html