On Thu, 6 Dec 2007, Jeff King wrote: > On Thu, Dec 06, 2007 at 09:18:39AM -0500, Nicolas Pitre wrote: > > > > The downside is that the threading partitions the object space, so the > > > resulting size is not necessarily as small (but I don't know that > > > anybody has done testing on large repos to find out how large the > > > difference is). > > > > Quick guesstimate is in the 1% ballpark. > > Fortunately, we now have numbers. Harvey Harrison reported repacking the > gcc repo and getting these results: > > > /usr/bin/time git repack -a -d -f --window=250 --depth=250 > > > > 23266.37user 581.04system 7:41:25elapsed 86%CPU (0avgtext+0avgdata 0maxresident)k > > 0inputs+0outputs (419835major+123275804minor)pagefaults 0swaps > > > > -r--r--r-- 1 hharrison hharrison 29091872 2007-12-06 07:26 pack-1d46ca030c3d6d6b95ad316deb922be06b167a3d.idx > > -r--r--r-- 1 hharrison hharrison 324094684 2007-12-06 07:26 pack-1d46ca030c3d6d6b95ad316deb922be06b167a3d.pack > > I tried the threaded repack with pack.threads = 3 on a dual-processor > machine, and got: > > time git repack -a -d -f --window=250 --depth=250 > > real 309m59.849s > user 377m43.948s > sys 8m23.319s > > -r--r--r-- 1 peff peff 28570088 2007-12-06 10:11 pack-1fa336f33126d762988ed6fc3f44ecbe0209da3c.idx > -r--r--r-- 1 peff peff 339922573 2007-12-06 10:11 pack-1fa336f33126d762988ed6fc3f44ecbe0209da3c.pack > > So it is about 5% bigger. Right. I should probably revisit that idea of finding deltas across partition boundaries to mitigate that loss. And those partitions could be made coarser as well to reduce the number of such partition gaps (just increase the value of chunk_size on line 1648 in builtin-pack-objects.c). > What is really disappointing is that we saved > only about 20% of the time. I didn't sit around watching the stages, but > my guess is that we spent a long time in the single threaded "writing > objects" stage with a thrashing delta cache. Maybe you should run the non threaded repack on the same machine to have a good comparison. And if you have only 2 CPUs, you will have better performances with pack.threads = 2, otherwise there'll be wasteful task switching going on. And of course, if the delta cache is being trashed, that might be due to the way the existing pack was previously packed. Hence the current pack might impact object _access_ when repacking them. So for a really really fair performance comparison, you'd have to preserve the original pack and swap it back before each repack attempt. Nicolas - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html