On Fri, Dec 02, 2011 at 02:07:45PM +0100, Thomas Rast wrote: > where I put the --cached originally because that makes it independent > of the worktree (which in the very first measurements I still had > wiped, as I tend to do for this repo; I checked it out again after > that). This in fact gives me (~/g/git-grep --cached > INITRAMFS_ROOT_UID, leaving aside -W; best of 10): > > THREADS=8: 2.88user 0.21system 0:02.94elapsed > THREADS=4: 2.89user 0.29system 0:02.99elapsed > THREADS=2: 2.83user 0.36system 0:02.87elapsed > NO_PTHREADS: 2.16user 0.08system 0:02.25elapsed > > Uhuh. Doesn't scale so well after all. But removing the --cached, as > most people probably would: > > THREADS=8: 0.19user 0.32system 0:00.16elapsed > THREADS=4: 0.16user 0.34system 0:00.17elapsed > THREADS=2: 0.18user 0.32system 0:00.26elapsed > NO_PTHREADS: 0.12user 0.17system 0:00.31elapsed > > So I conclude that during any grep that cannot use the worktree, > having any threads hurts. Wow, that's horrible. Leaving aside the parallelism, it's just terrible that reading from the cache is 20 times slower than the worktree. I get similar results on my quad-core machine. A quick perf run shows most of the time is spent inflating objects. The diff code has a sneaky trick to re-use worktree files when we know they are stat-clean (in diff's case it is to avoid writing a tempfile). I wonder if we should use the same trick here. It would hurt the cold cache case, though, as the compressed versions require fewer disk accesses, of course. -Peff PS I suspect your timings are somewhat affected by the simplicity of the regex you are asking for. The time to inflate the blobs dominates, because the search is just a memmem(). On my quad-core w/ hyperthreading (i.e., 8 apparent cores): [no caching, simple regex; we get some parallelism, but the regex task is just not that intensive] $ /usr/bin/time git grep INITRAMFS_ROOT_UID >/dev/null 0.42user 0.45system 0:00.15elapsed 578%CPU [no caching, harder regex; we get much higher CPU utilization] $ /usr/bin/time git grep 'a.*b' >/dev/null 14.68user 0.50system 0:02.00elapsed 758%CPU [with caching, simple regex; we get almost _no_ parallelism because all of our time is spent deflating under a lock, and the regex task takes very little time] $ /usr/bin/time git grep --cached INITRAMFS_ROOT_UID >/dev/null 7.64user 0.41system 0:07.61elapsed 105%CPU [with caching, harder regex; not as much parallelism as we hoped for, but still much more than before. Because there is actually work to parallelize in the regex] $ /usr/bin/time git grep --cached 'a.*b' >/dev/null 23.46user 0.47system 0:08.42elapsed 284%CPU So I think there is value in parallelizing even --cached greps. But we could do so much better if blob inflation could be done in parallel. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html