Re: "git grep" parallelism question

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Fri, 26 Apr 2013 13:31:41 -0700

On Fri, Apr 26, 2013 at 12:19 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> OK, you have to recompile at least once for experiment, so perhaps
> building the test binary with this patch may help.

So here's my experiment on my machine with this patch and the kernel
tree. First I did the warm-cache case:

  for i in 1 4 8 16 32 64
  do
    echo $i:
    for j in 1 2 3 4
    do
      t=$(sh -c "time git grep --threads=$i hjahsja" 2>&1 | grep real)
      echo $i $t
    done
  done

and the numbers are pretty stable, here's just he summary (best of
four tries for each case):

   1 real 0m0.598s
   4 real 0m0.253s
   8 real 0m0.235s
  16 real 0m0.269s
  32 real 0m0.412s
  64 real 0m0.420s

so for this machine, 8 threads (our old number) is actually optimal
even if it has just four cores (actually, two cores with HT). I
suspect it's just because the load is slightly unbalanced, so a few
extra threads helps. Looks like anything in the 4-16 thread range is
ok, though. But 32 threads is clearly too much.

Then I did the exact same thing, but with the "echo 3 >
/proc/sys/vm/drop_caches" just before the timing of the git grep.
Again, summarizing (best-of-four number, the variation wasn't that
big):

   1 real 0m17.866s
   4 real 0m6.367s
   8 real 0m4.855s
  16 real 0m4.307s
  32 real 0m4.153s
  64 real 0m4.128s

here, the numbers continue to improve up to 64 threads, although the
difference between 32 and 64 is starting to be in the noise. I suspect
it's a combination of better IO overlap (although not all SSD's will
even improve all that much from overlapping IO past a certain point)
and probably a more noticeable imbalance between threads.

Anyway, for *my* machine and for *this* particular load, I'd say that
we're already pretty close to optimal. I did some trials just to see,
but the best hot-cache numbers were fairly reliably for 7 or 8
threads.

Looking at the numbers, I can't really convince myself that it would
be worth it to do (say) 12 threads on this machine. Yes, the
cold-cache case improves from the 8-thread case (best-of-four for 12
threads: 0m4.467s), but the hot-cache case has gotten sufficiently
worse (0m0.251s) that I'm not sure..

Of course, in *absolute* numbers the cold-cache case is so much slower
that a half-second improvement from going to 16 threads might be
considered worth it, because while the the hot-cache case gets worse,
we may just not care because it's fast enough that it's not
noticeable.

Anyway, I think your patch is good if for no other reason that it
allows this kind of testing, but at least for my machine, clearly the
current default of eight threads is actually "good enough". Maybe
somebody with a very different machine might want to run the above
script and see if how sensitive other machines are to this parameter..

                   Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html