Re: [PATCH] grep: detect number of CPUs for thread spawning

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Pete,

Thank you for the feedback.

On 11/06/2011 03:50 PM, Pete Wyckoff wrote:

From: Eric Herman<eric@xxxxxxxxxx>

Change the number of threads that we spawn from a hardcoded value of
"8" to what online_cpus() returns.


I agree with the need to exploit>8 CPUs, but I lose a lot of
performance when limiting the threads to the number of physical
CPUs.

Ah, yes, Being focused on big machines, I did not actually test with low CPU machines, certainly not with NFS mounts.


Tests without your patch on master, just changing "#define
THREADS" from 8 to 2.  On a 2-core Intel Core2 Duo.

Producing lots of output:

     8 threads:

	$ time ~/u/src/git/bin-wrappers/git grep f>  /dev/null
	0m14.02s user 0m3.64s sys 0m11.93s elapsed 148.07 %CPU
	$ time ~/u/src/git/bin-wrappers/git grep f>  /dev/null
	0m13.86s user 0m3.70s sys 0m11.82s elapsed 148.57 %CPU

     2 threads:

	$ time ~/u/src/git/bin-wrappers/git grep f>  /dev/null
	0m15.14s user 0m3.52s sys 0m24.22s elapsed 77.05 %CPU
	$ time ~/u/src/git/bin-wrappers/git grep f>  /dev/null
	0m14.85s user 0m3.79s sys 0m24.20s elapsed 77.05 %CPU

Producing no output:

     8 threads:

	$ time ~/u/src/git/bin-wrappers/git grep unfindable-string
	0m1.14s user 0m3.68s sys 0m5.17s elapsed 93.22 %CPU
	$ time ~/u/src/git/bin-wrappers/git grep unfindable-string
	0m1.28s user 0m3.56s sys 0m5.15s elapsed 94.22 %CPU

     2 threads:

	$ time ~/u/src/git/bin-wrappers/git grep unfindable-string
	0m1.36s user 0m3.64s sys 0m16.82s elapsed 29.75 %CPU
	$ time ~/u/src/git/bin-wrappers/git grep unfindable-string
	0m1.38s user 0m3.66s sys 0m16.81s elapsed 30.04 %CPU

My workdir is on NFS, where even though the repository is fully
cached, the open()s must go to the server.  Using more threads
than CPUs makes it more likely that some thread isn't blocked.

This is good data.
It gives me ideas for how I can do some more testing.


You could add a #threads knob,

Sure, adding a knob is not a bad idea.


but then we'd have to get
everybody on NFS to set that properly.

Indeed, I think you agree that it would be better if there was no need for most people to fiddle with yet another knob.


 Or take a look at
preload_index() to see how it guesses at how many threads it
needs.

Good tip.
A quick peek at preload_index suggests that it was a bit of guesswork:

/*
 * Mostly randomly chosen maximum thread counts: we
 * cap the parallelism to 20 threads, and we want
 * to have at least 500 lstat's per thread for it to
 * be worth starting a thread.
 */

However, your comments make me wonder if a rule-of-thumb like "3 + online_cpus()" would yield better results across both large and small numbers of cores with either blazing fast or very slow storage.

I will create a setup similar to the one you describe and do some exploration.

Cheers,
 -Eric

--
http://www.freesa.org/ -- mobile: +31 620719662
aim: ericigps -- skype: eric_herman -- jabber: eric.herman@xxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]