Re: [PATCH 4/2] grep: turn off threading for non-worktree

Jeff King <peff@xxxxxxxx> · Wed, 7 Dec 2011 15:45:31 -0500

On Wed, Dec 07, 2011 at 03:11:05PM -0500, J. Bruce Fields wrote:

> >   $ time git grep --threads=8 'a.*b' HEAD >/dev/null
> >   real    0m8.655s
> >   user    0m23.817s
> >   sys     0m0.480s
> 
> Dumb question (I missed the beginning of the conversation): what kind of
> storage are you using, and is the data already cached?

Sorry, I should have been clear: all of those numbers are with a warm
cache. So this is measuring only CPU.

> I seem to recall part of the motivation for the multithreading being
> NFS, where the goal isn't so much to keep CPU's busy as it is to keep
> the network busy.
> 
> Probably a bigger problem for something like "git status" which I think
> ends up doing a series of stat's (which can each require a round trip to
> the server in the NFS case), as it is a problem for something like
> git-grep that's also doing reads.
> 
> Just a plea for considering the IO cost as well when making these kinds
> of decisions....

This system has a decent-quality SSD, so the I/O timings are perhaps
not as interesting as they might otherwise be. But here are cold cache
numbers (each run after 'echo 3 >/proc/sys/vm/drop_caches'):

  HEAD, --threads=0: 4.956s
  HEAD, --threads=8: 9.917s
  working tree, --threads=0: 17.444s
  working tree, --threads=8: 6.462s

So when pulling from the object db, threads are still a huge loss
(because the data is compressed, the SSD is fast, and we spend a lot of
CPU time inflating; so it ends up close to the warm cache results). But
for the working tree, the I/O parallelism is a huge win.

So at least on my system, cold cache vs. warm cache leads to the same
conclusion. "git grep --threads=8 ... HEAD" might still be a win on slow
disks or NFS, though.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html