Hello Pete,
Thank you for the feedback.
On 11/06/2011 03:50 PM, Pete Wyckoff wrote:
From: Eric Herman<eric@xxxxxxxxxx>
Change the number of threads that we spawn from a hardcoded value of
"8" to what online_cpus() returns.
I agree with the need to exploit>8 CPUs, but I lose a lot of
performance when limiting the threads to the number of physical
CPUs.
Ah, yes, Being focused on big machines, I did not actually test with low
CPU machines, certainly not with NFS mounts.
Tests without your patch on master, just changing "#define
THREADS" from 8 to 2. On a 2-core Intel Core2 Duo.
Producing lots of output:
8 threads:
$ time ~/u/src/git/bin-wrappers/git grep f> /dev/null
0m14.02s user 0m3.64s sys 0m11.93s elapsed 148.07 %CPU
$ time ~/u/src/git/bin-wrappers/git grep f> /dev/null
0m13.86s user 0m3.70s sys 0m11.82s elapsed 148.57 %CPU
2 threads:
$ time ~/u/src/git/bin-wrappers/git grep f> /dev/null
0m15.14s user 0m3.52s sys 0m24.22s elapsed 77.05 %CPU
$ time ~/u/src/git/bin-wrappers/git grep f> /dev/null
0m14.85s user 0m3.79s sys 0m24.20s elapsed 77.05 %CPU
Producing no output:
8 threads:
$ time ~/u/src/git/bin-wrappers/git grep unfindable-string
0m1.14s user 0m3.68s sys 0m5.17s elapsed 93.22 %CPU
$ time ~/u/src/git/bin-wrappers/git grep unfindable-string
0m1.28s user 0m3.56s sys 0m5.15s elapsed 94.22 %CPU
2 threads:
$ time ~/u/src/git/bin-wrappers/git grep unfindable-string
0m1.36s user 0m3.64s sys 0m16.82s elapsed 29.75 %CPU
$ time ~/u/src/git/bin-wrappers/git grep unfindable-string
0m1.38s user 0m3.66s sys 0m16.81s elapsed 30.04 %CPU
My workdir is on NFS, where even though the repository is fully
cached, the open()s must go to the server. Using more threads
than CPUs makes it more likely that some thread isn't blocked.
This is good data.
It gives me ideas for how I can do some more testing.
You could add a #threads knob,
Sure, adding a knob is not a bad idea.
but then we'd have to get
everybody on NFS to set that properly.
Indeed, I think you agree that it would be better if there was no need
for most people to fiddle with yet another knob.
Or take a look at
preload_index() to see how it guesses at how many threads it
needs.
Good tip.
A quick peek at preload_index suggests that it was a bit of guesswork:
/*
* Mostly randomly chosen maximum thread counts: we
* cap the parallelism to 20 threads, and we want
* to have at least 500 lstat's per thread for it to
* be worth starting a thread.
*/
However, your comments make me wonder if a rule-of-thumb like "3 +
online_cpus()" would yield better results across both large and small
numbers of cores with either blazing fast or very slow storage.
I will create a setup similar to the one you describe and do some
exploration.
Cheers,
-Eric
--
http://www.freesa.org/ -- mobile: +31 620719662
aim: ericigps -- skype: eric_herman -- jabber: eric.herman@xxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html