On Wed, Dec 07, 2011 at 03:11:05PM -0500, J. Bruce Fields wrote: > > $ time git grep --threads=8 'a.*b' HEAD >/dev/null > > real 0m8.655s > > user 0m23.817s > > sys 0m0.480s > > Dumb question (I missed the beginning of the conversation): what kind of > storage are you using, and is the data already cached? Sorry, I should have been clear: all of those numbers are with a warm cache. So this is measuring only CPU. > I seem to recall part of the motivation for the multithreading being > NFS, where the goal isn't so much to keep CPU's busy as it is to keep > the network busy. > > Probably a bigger problem for something like "git status" which I think > ends up doing a series of stat's (which can each require a round trip to > the server in the NFS case), as it is a problem for something like > git-grep that's also doing reads. > > Just a plea for considering the IO cost as well when making these kinds > of decisions.... This system has a decent-quality SSD, so the I/O timings are perhaps not as interesting as they might otherwise be. But here are cold cache numbers (each run after 'echo 3 >/proc/sys/vm/drop_caches'): HEAD, --threads=0: 4.956s HEAD, --threads=8: 9.917s working tree, --threads=0: 17.444s working tree, --threads=8: 6.462s So when pulling from the object db, threads are still a huge loss (because the data is compressed, the SSD is fast, and we spend a lot of CPU time inflating; so it ends up close to the warm cache results). But for the working tree, the I/O parallelism is a huge win. So at least on my system, cold cache vs. warm cache leads to the same conclusion. "git grep --threads=8 ... HEAD" might still be a win on slow disks or NFS, though. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html