Ingo, At the time of switching the anon-vma tree's lock from mutex to rw-sem (commit 5a505085), we encountered regressions for fork heavy workload. A lot of optimizations to rw-sem (e.g. lock stealing) helped to mitigate the problem. I tried an experiment on the 3.10-rc4 kernel to compare the performance of rw-sem to one that uses mutex. I saw a 8% regression in throughput for rw-sem vs a mutex implementation in 3.10-rc4. For the experiments, I used the exim mail server workload in the MOSBENCH test suite on 4 socket (westmere) and a 4 socket (ivy bridge) with the number of clients sending mail equal to number of cores. The mail server will fork off a process to handle an incoming mail and put it into mail spool. The lock protecting the anon-vma tree is stressed due to heavy forking. On both machines, I saw that the mutex implementation has 8% more throughput. I've pinned the cpu frequency to maximum in the experiments. I've tried two separate tweaks to the rw-sem on 3.10-rc4. I've tested each tweak individually. 1) Add an owner field when a writer holds the lock and introduce optimistic spinning when an active writer is holding the semaphore. It reduced the context switching by 30% to a level very close to the mutex implementation. However, I did not see any throughput improvement of exim. 2) When the sem->count's active field is non-zero (i.e. someone is holding the lock), we can skip directly to the down_write_failed path, without adding the RWSEM_DOWN_WRITE_BIAS and taking it off again from sem->count, saving us two atomic operations. Since we will try the lock stealing again later, this should be okay. Unfortunately it did not improve the exim workload either. Any suggestions on the difference between rwsem and mutex performance and possible improvements to recover this regression? Thanks. Tim vmstat for mutex implementation: procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 38 0 0 130957920 47860 199956 0 0 0 56 236342 476975 14 72 14 0 0 41 0 0 130938560 47860 219900 0 0 0 0 236816 479676 14 72 14 0 0 vmstat for rw-sem implementation (3.10-rc4) procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 40 0 0 130933984 43232 202584 0 0 0 0 321817 690741 13 71 16 0 0 39 0 0 130913904 43232 224812 0 0 0 0 322193 692949 13 71 16 0 0 Profile for mutex implementation: 5.02% exim [kernel.kallsyms] [k] page_fault 3.67% exim [kernel.kallsyms] [k] anon_vma_interval_tree_insert 2.66% exim [kernel.kallsyms] [k] unmap_single_vma 2.15% exim [kernel.kallsyms] [k] do_raw_spin_lock 2.14% exim [kernel.kallsyms] [k] page_cache_get_speculative 2.04% exim [kernel.kallsyms] [k] copy_page_rep 1.58% exim [kernel.kallsyms] [k] clear_page_c 1.55% exim [kernel.kallsyms] [k] cpu_relax 1.55% exim [kernel.kallsyms] [k] mutex_unlock 1.42% exim [kernel.kallsyms] [k] __slab_free 1.16% exim [kernel.kallsyms] [k] mutex_lock 1.12% exim libc-2.13.so [.] vfprintf 0.99% exim [kernel.kallsyms] [k] find_vma 0.95% exim [kernel.kallsyms] [k] __list_del_entry Profile for rw-sem implementation 4.88% exim [kernel.kallsyms] [k] page_fault 3.43% exim [kernel.kallsyms] [k] anon_vma_interval_tree_insert 2.65% exim [kernel.kallsyms] [k] unmap_single_vma 2.46% exim [kernel.kallsyms] [k] do_raw_spin_lock 2.25% exim [kernel.kallsyms] [k] copy_page_rep 2.01% exim [kernel.kallsyms] [k] page_cache_get_speculative 1.81% exim [kernel.kallsyms] [k] clear_page_c 1.51% exim [kernel.kallsyms] [k] __slab_free 1.12% exim libc-2.13.so [.] vfprintf 1.06% exim [kernel.kallsyms] [k] __list_del_entry 1.02% swapper [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore 1.00% exim [kernel.kallsyms] [k] find_vma 0.93% exim [kernel.kallsyms] [k] mutex_unlock turbostat for mutex implementation: pk cor CPU %c0 GHz TSC %c1 %c3 %c6 CTMP %pc3 %pc6 82.91 2.39 2.39 11.65 2.76 2.68 51 0.00 0.00 turbostat of rw-sem implementation (3.10-rc4): pk cor CPU %c0 GHz TSC %c1 %c3 %c6 CTMP %pc3 %pc6 80.10 2.39 2.39 14.96 2.80 2.13 52 0.00 0.00 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>