On Sun, 7 May 2006, Jeff King wrote: > > - Total savings by going shallow: 10.7% > > So basically, trees and commits DON'T compress as well as historical > blobs (potentially because git-pack-objects isn't currently optimized > for this -- I haven't checked). As a result, we're saving only 10% by > going shallow instead of a potential 50%. The biggest size savers from packing is (in rough order of relevance, if I recall the rough statistics I did): - avoiding block boundaries. - delta packing of blobs - delta packing of trees - regular compression The block boundaries are huge, we have tons of small objects, and that was one of the primary reasons for packing. I'd suspect that this is a 3:1 factor for a lot of things for many "common" filesystem setups. You probably didn't even account for the size of inodes in your "du" setup. And blobs with history generally delta very well (_much_ better than regular compression). Trees should _delta_ very well, but they basically don't compress, especially after deltaing. The SHA1's are totally incompressible (in a tree they aren't even ASCII), and as a deta, the names won't compress much either because they are short. Commits are fairly small, shouldn't delta all that much, and they don't even compress _that_ well either (they're normal text and often have some redundancy with the committer and author being the same, but they are short and have some fairly incompressible elements, so..) The thing with trees in particular is that they are very common for the kernel (and probably not so much for many other projects). A single commit ends up quite commonly being just one commit object, one blob (that deltas really well), and three or four trees. Merges often have no new blobs at all, just several new trees and the commit object. So a huge amount of the wins from packing come from the file _history_, the part that a shallow clone (on purpose) leaves behind. The regular compression will pick up a fair amount of slack with the blobs, but it's a much smaller factor than the delta compression for something that has a long history. It's somewhat interesting to note that over the year that we've used git, the kernel pack-size hasn't even increased all that much. I forget exactly what it was when we started packing, but it was on the order of ~75M. It is now 115M for me. And the old linux-history thing (full BK history over three years) is 177M - not much more than twice the size of just a few kernel versions - with some higher packing ratios.. Exactly because blobs delta so incredibly well. Linus - : send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html