On Fri, Apr 26, 2013 at 12:19 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote: > > OK, you have to recompile at least once for experiment, so perhaps > building the test binary with this patch may help. So here's my experiment on my machine with this patch and the kernel tree. First I did the warm-cache case: for i in 1 4 8 16 32 64 do echo $i: for j in 1 2 3 4 do t=$(sh -c "time git grep --threads=$i hjahsja" 2>&1 | grep real) echo $i $t done done and the numbers are pretty stable, here's just he summary (best of four tries for each case): 1 real 0m0.598s 4 real 0m0.253s 8 real 0m0.235s 16 real 0m0.269s 32 real 0m0.412s 64 real 0m0.420s so for this machine, 8 threads (our old number) is actually optimal even if it has just four cores (actually, two cores with HT). I suspect it's just because the load is slightly unbalanced, so a few extra threads helps. Looks like anything in the 4-16 thread range is ok, though. But 32 threads is clearly too much. Then I did the exact same thing, but with the "echo 3 > /proc/sys/vm/drop_caches" just before the timing of the git grep. Again, summarizing (best-of-four number, the variation wasn't that big): 1 real 0m17.866s 4 real 0m6.367s 8 real 0m4.855s 16 real 0m4.307s 32 real 0m4.153s 64 real 0m4.128s here, the numbers continue to improve up to 64 threads, although the difference between 32 and 64 is starting to be in the noise. I suspect it's a combination of better IO overlap (although not all SSD's will even improve all that much from overlapping IO past a certain point) and probably a more noticeable imbalance between threads. Anyway, for *my* machine and for *this* particular load, I'd say that we're already pretty close to optimal. I did some trials just to see, but the best hot-cache numbers were fairly reliably for 7 or 8 threads. Looking at the numbers, I can't really convince myself that it would be worth it to do (say) 12 threads on this machine. Yes, the cold-cache case improves from the 8-thread case (best-of-four for 12 threads: 0m4.467s), but the hot-cache case has gotten sufficiently worse (0m0.251s) that I'm not sure.. Of course, in *absolute* numbers the cold-cache case is so much slower that a half-second improvement from going to 16 threads might be considered worth it, because while the the hot-cache case gets worse, we may just not care because it's fast enough that it's not noticeable. Anyway, I think your patch is good if for no other reason that it allows this kind of testing, but at least for my machine, clearly the current default of eight threads is actually "good enough". Maybe somebody with a very different machine might want to run the above script and see if how sensitive other machines are to this parameter.. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html