Re: [PATCH v4] Threaded grep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Mon, 25 Jan 2010, Fredrik Kuivinen wrote:
> 
> The results below are best of five runs in the Linux repository (on a
> box with two cores).
> 
> git grep qwerty

Before:

	real	0m0.531s
	user	0m0.412s
	sys	0m0.112s

After:

	real	0m0.151s
	user	0m0.720s
	sys	0m0.272s


> $ /usr/bin/time git grep void

Before:

	real	0m1.144s
	user	0m0.988s
	sys	0m0.148s

After:
	real	0m0.290s
	user	0m1.732s
	sys	0m0.232s

So it's helping a lot (~3.5x and ~3.9x) on this 4-core HT setup. 

I don't seem to ever get more than a 4x speedup, so my guess is that HT 
simply isn't able to do much of anything with this load. 

The profile for the threaded case says:

    51.73%      git  libc-2.11.1.so                 [.] re_search_internal
    11.47%      git  [kernel]                       [k] copy_user_generic_string
     2.90%      git  libc-2.11.1.so                 [.] __strlen_sse2
     2.66%      git  [kernel]                       [k] link_path_walk
     2.55%      git  [kernel]                       [k] intel_pmu_enable_all
     2.40%      git  [kernel]                       [k] __d_lookup
     1.71%      git  libc-2.11.1.so                 [.] __GI___libc_malloc
     1.55%      git  [kernel]                       [k] _raw_spin_lock
     1.43%      git  [kernel]                       [k] sys_futex
     1.30%      git  libc-2.11.1.so                 [.] __cfree
     1.28%      git  [kernel]                       [k] intel_pmu_disable_all
     1.25%      git  libc-2.11.1.so                 [.] __GI_memchr
     1.14%      git  libc-2.11.1.so                 [.] _int_malloc
     1.02%      git  [kernel]                       [k] effective_load

and the only thing that makes me go "eh?" there is the strlen(). Why is 
that so hot?  But locking doesn't seem to be the biggest issue, and in 
general I think this is all pretty good. The 'effective_load' thing is the 
scheduler, so there's certainly some context switching going on, probably 
still due to excessive synchronization, but it's equally clear that that 
is certainly not a dominant factor.

One potentially interesting data point is that if I make NR_THREADS be 16, 
performance goes down, and I get more locking overhead. So NR_THREADS of 8 
works well on this machine.

So ack from me. The patch looks reasonably clean too, at least for 
something as complex as a multi-threaded grep.

One worry is, of course, whether all regex() implementations are 
thread-safe. Maybe there are broken libraries that have hidden global 
state in them?

			Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]