Re: [PATCH v4] Threaded grep

Fredrik Kuivinen <frekui@xxxxxxxxx> · Tue, 26 Jan 2010 13:10:50 +0100

On Tue, Jan 26, 2010 at 00:59, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> The profile for the threaded case says:
>
>    51.73%      git  libc-2.11.1.so                 [.] re_search_internal
>    11.47%      git  [kernel]                       [k] copy_user_generic_string
>     2.90%      git  libc-2.11.1.so                 [.] __strlen_sse2
>     2.66%      git  [kernel]                       [k] link_path_walk
>     2.55%      git  [kernel]                       [k] intel_pmu_enable_all
>     2.40%      git  [kernel]                       [k] __d_lookup
>     1.71%      git  libc-2.11.1.so                 [.] __GI___libc_malloc
>     1.55%      git  [kernel]                       [k] _raw_spin_lock
>     1.43%      git  [kernel]                       [k] sys_futex
>     1.30%      git  libc-2.11.1.so                 [.] __cfree
>     1.28%      git  [kernel]                       [k] intel_pmu_disable_all
>     1.25%      git  libc-2.11.1.so                 [.] __GI_memchr
>     1.14%      git  libc-2.11.1.so                 [.] _int_malloc
>     1.02%      git  [kernel]                       [k] effective_load
>
> and the only thing that makes me go "eh?" there is the strlen(). Why is
> that so hot?  But locking doesn't seem to be the biggest issue, and in
> general I think this is all pretty good. The 'effective_load' thing is the
> scheduler, so there's certainly some context switching going on, probably
> still due to excessive synchronization, but it's equally clear that that
> is certainly not a dominant factor.

I see the strlen in my profiles as well, but I haven't figured out
where it comes from. I get the following:

    51.16%  git-grep  /lib/tls/i686/cmov/libc-2.10.1.so
[.] 0x000000000b14c6
    10.12%  git-grep  /lib/tls/i686/cmov/libc-2.10.1.so
[.] __GI_strlen
     9.27%  git-grep  [kernel]
[k] __copy_to_user_ll
     4.68%  git-grep  /lib/tls/i686/cmov/libc-2.10.1.so
[.] __memchr
     1.72%  git-grep  [kernel]
[k] __d_lookup
     1.18%  git-grep  /lib/i686/cmov/libcrypto.so.0.9.8
[.] sha1_block_asm_data_order
     1.11%  git-grep  [kernel]
[k] __ticket_spin_lock
     0.84%  git-grep  [vdso]
[.] 0x00000000b6c422

If I use perf record -g I get

    10.39%  git-grep  /lib/tls/i686/cmov/libc-2.10.1.so
[.] __GI_strlen
                |
                |--99.05%-- look_ahead
                |          grep_buffer_1
                |          grep_buffer
                |          run
                |          start_thread
                |          __clone
                |
                |--0.64%-- grep_file
                |          grep_cache
                |          cmd_grep
                |          run_builtin
                |          handle_internal_command
                |          main
                |          __libc_start_main
                |          0x804ae81
                 --0.32%-- [...]

This doesn't make much sense to me as look_ahead doesn't call strlen
(I compiled git with -O0 to avoid any issues with inlined functions).
But I haven't used perf so much, so maybe I'm reading the output the
wrong way.

> One potentially interesting data point is that if I make NR_THREADS be 16,
> performance goes down, and I get more locking overhead. So NR_THREADS of 8
> works well on this machine.

Interesting. I get the best results with 8 threads as well, but I only
have two cores.

> One worry is, of course, whether all regex() implementations are
> thread-safe. Maybe there are broken libraries that have hidden global
> state in them?

That would certainly be a problem. A quick google search didn't show
any known bugs. Of course, this doesn't tell us anything about the
unknown ones.

- Fredrik
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html