On Sat, 12 Sep 2009, René Scharfe wrote: > > But what has bugged me since I added zip support is this result: > > # git v1.6.5-rc0 > $ time git archive --format=zip -6 v2.6.31 >/dev/null > > real 0m16.471s > user 0m16.340s > sys 0m0.128s > > I'd have expected this to be the slowest case, because it's compressing > all files separately, i.e. it needs to create and flush the compression > context lots of times instead of only once as in the two cases above. Oh no, I think it's easily explained. Compressing many small files really is often cheaper than compressing one large one. With lots of small files, you end up being very limited in the search-space, so the compression decisions get simpler. Compression in general is not O(n), it's some non-linear factor, often something like O(n**2). Of course, all compression libraries have an upper bound on the non-linearity (often expressed as a "window size"), so a particular compression algorithm may end up being close to O(n) (with a huge constant). But that upper bound will only kick in for large files, small files that fit entirely into the compression window will still see the underlying O(n**2) or whatever. But I have no actual numbers to back up the above blathering. But feel free to try to compress 10 small files and compare it to compressing one file that is as big as the sum. I bet you'll see it. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html