On Thu, 18 Dec 2008, James Pickens wrote: > > This speeds up operations like 'git clone' on NFS drives tremendously, but > slows down the same operations on local disks. > > Partitioning the work and launching threads is done in unpack-trees.c. The code > is mostly copied from preload_index.c. The maximum number of threads is set to > 8, which seemed to give a reasonable tradeoff between performance improvement on > NFS and degradation on local disks. Hmm. I don't really like this very much. Why? Because as your locking shows, we can really only parallelise the actual write-out anyway, and rather than do any locking there, wouldn't it be better to have a notion of "queued work" (which would be just the write-out) to be done in parallel? So instead of doing all the unpacking etc in parallel (with locking around it to serialize it), I'd suggest doing ll the unpacking serially since that isn't the problem anyway (and since you have to protect it with a lock anyway), and just have a "write out and free the buffer" phase that is done in the threads. The alternative would be to actually do what your patch suggests, but actually try to make the code git SHA1 object handling be thread-safe. At that point, the ugly locking in write_entry() would go away, and you might actually improve performance on SMP thanks to doing the CPU part in parallel. But as-is, I think the patch is a bit ugly. The reason I liked the index pre-reading was that it could be done entirely locklessly, so it really did parallelize it _fully_ (even if the IO latency part was the much bigger issue), and that was also why it actually ended up helping even on a local disk (only if you have multiple cores, of course). Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html