On Thu, Dec 06, 2007 at 10:35:22AM -0800, Linus Torvalds wrote: > > What is really disappointing is that we saved only about 20% of the > > time. I didn't sit around watching the stages, but my guess is that we > > spent a long time in the single threaded "writing objects" stage with a > > thrashing delta cache. > > I don't think you spent all that much time writing the objects. That part > isn't very intensive, it's mostly about the IO. It can get nasty with super-long deltas thrashing the cache, I think. But in this case, I think it ended up being just a poor division of labor caused by the chunk_size parameter using the quite large window size (see elsewhere in the thread for discussion). > I suspect you may simply be dominated by memory-throughput issues. The > delta matching doesn't cache all that well, and using two or more cores > isn't going to help all that much if they are largely waiting for memory > (and quite possibly also perhaps fighting each other for a shared cache? > Is this a Core 2 with the shared L2?) I think the chunk_size more or less explains it. I have had reasonable success keeping both CPUs busy on similar tasks in the past (but with smaller window sizes). For reference, it was a Core 2 Duo; do they all share L2, or is there something I can look for in /proc/cpuinfo? -Peff - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html