On Tue, Jan 26, 2010 at 00:59, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > The profile for the threaded case says: > > 51.73% git libc-2.11.1.so [.] re_search_internal > 11.47% git [kernel] [k] copy_user_generic_string > 2.90% git libc-2.11.1.so [.] __strlen_sse2 > 2.66% git [kernel] [k] link_path_walk > 2.55% git [kernel] [k] intel_pmu_enable_all > 2.40% git [kernel] [k] __d_lookup > 1.71% git libc-2.11.1.so [.] __GI___libc_malloc > 1.55% git [kernel] [k] _raw_spin_lock > 1.43% git [kernel] [k] sys_futex > 1.30% git libc-2.11.1.so [.] __cfree > 1.28% git [kernel] [k] intel_pmu_disable_all > 1.25% git libc-2.11.1.so [.] __GI_memchr > 1.14% git libc-2.11.1.so [.] _int_malloc > 1.02% git [kernel] [k] effective_load > > and the only thing that makes me go "eh?" there is the strlen(). Why is > that so hot? But locking doesn't seem to be the biggest issue, and in > general I think this is all pretty good. The 'effective_load' thing is the > scheduler, so there's certainly some context switching going on, probably > still due to excessive synchronization, but it's equally clear that that > is certainly not a dominant factor. I see the strlen in my profiles as well, but I haven't figured out where it comes from. I get the following: 51.16% git-grep /lib/tls/i686/cmov/libc-2.10.1.so [.] 0x000000000b14c6 10.12% git-grep /lib/tls/i686/cmov/libc-2.10.1.so [.] __GI_strlen 9.27% git-grep [kernel] [k] __copy_to_user_ll 4.68% git-grep /lib/tls/i686/cmov/libc-2.10.1.so [.] __memchr 1.72% git-grep [kernel] [k] __d_lookup 1.18% git-grep /lib/i686/cmov/libcrypto.so.0.9.8 [.] sha1_block_asm_data_order 1.11% git-grep [kernel] [k] __ticket_spin_lock 0.84% git-grep [vdso] [.] 0x00000000b6c422 If I use perf record -g I get 10.39% git-grep /lib/tls/i686/cmov/libc-2.10.1.so [.] __GI_strlen | |--99.05%-- look_ahead | grep_buffer_1 | grep_buffer | run | start_thread | __clone | |--0.64%-- grep_file | grep_cache | cmd_grep | run_builtin | handle_internal_command | main | __libc_start_main | 0x804ae81 --0.32%-- [...] This doesn't make much sense to me as look_ahead doesn't call strlen (I compiled git with -O0 to avoid any issues with inlined functions). But I haven't used perf so much, so maybe I'm reading the output the wrong way. > One potentially interesting data point is that if I make NR_THREADS be 16, > performance goes down, and I get more locking overhead. So NR_THREADS of 8 > works well on this machine. Interesting. I get the best results with 8 threads as well, but I only have two cores. > One worry is, of course, whether all regex() implementations are > thread-safe. Maybe there are broken libraries that have hidden global > state in them? That would certainly be a problem. A quick google search didn't show any known bugs. Of course, this doesn't tell us anything about the unknown ones. - Fredrik -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html