On 10/20/2014 02:56 PM, Peter Zijlstra wrote: > Hi, > > I figured I'd give my 2010 speculative fault series another spin: > > https://lkml.org/lkml/2010/1/4/257 > > Since then I think many of the outstanding issues have changed sufficiently to > warrant another go. In particular Al Viro's delayed fput seems to have made it > entirely 'normal' to delay fput(). Lai Jiangshan's SRCU rewrite provided us > with call_srcu() and my preemptible mmu_gather removed the TLB flushes from > under the PTL. > > The code needs way more attention but builds a kernel and runs the > micro-benchmark so I figured I'd post it before sinking more time into it. > > I realize the micro-bench is about as good as it gets for this series and not > very realistic otherwise, but I think it does show the potential benefit the > approach has. Does this mean that an entire fault can complete without ever taking mmap_sem at all? If so, that's a *huge* win. I'm a bit concerned about drivers that assume that the vma is unchanged during .fault processing. In particular, is there a race between .close and .fault? Would it make sense to add a per-vma rw lock and hold it during vma modification and .fault calls? --Andy > > (patches go against .18-rc1+) > > --- > > Using Kamezawa's multi-fault micro-bench from: https://lkml.org/lkml/2010/1/6/28 > > My Ivy Bridge EP (2*10*2) has a ~58% improvement in pagefault throughput: > > PRE: > > root@ivb-ep:~# perf stat -e page-faults,cache-misses --repeat 5 ./multi-fault 20 > > Performance counter stats for './multi-fault 20' (5 runs): > > 149,441,555 page-faults ( +- 1.25% ) > 2,153,651,828 cache-misses ( +- 1.09% ) > > 60.003082014 seconds time elapsed ( +- 0.00% ) > > POST: > > root@ivb-ep:~# perf stat -e page-faults,cache-misses --repeat 5 ./multi-fault 20 > > Performance counter stats for './multi-fault 20' (5 runs): > > 236,442,626 page-faults ( +- 0.08% ) > 2,796,353,939 cache-misses ( +- 1.01% ) > > 60.002792431 seconds time elapsed ( +- 0.00% ) > > > My Ivy Bridge EX (4*15*2) has a ~78% improvement in pagefault throughput: > > PRE: > > root@ivb-ex:~# perf stat -e page-faults,cache-misses --repeat 5 ./multi-fault 60 > > Performance counter stats for './multi-fault 60' (5 runs): > > 105,789,078 page-faults ( +- 2.24% ) > 1,314,072,090 cache-misses ( +- 1.17% ) > > 60.009243533 seconds time elapsed ( +- 0.00% ) > > POST: > > root@ivb-ex:~# perf stat -e page-faults,cache-misses --repeat 5 ./multi-fault 60 > > Performance counter stats for './multi-fault 60' (5 runs): > > 187,751,767 page-faults ( +- 2.24% ) > 1,792,758,664 cache-misses ( +- 2.30% ) > > 60.011611579 seconds time elapsed ( +- 0.00% ) > > (I've not yet looked at why the EX sucks chunks compared to the EP box, I > suspect we contend on other locks, but it could be anything.) > > --- > > arch/x86/mm/fault.c | 35 ++- > include/linux/mm.h | 19 +- > include/linux/mm_types.h | 5 + > kernel/fork.c | 1 + > mm/init-mm.c | 1 + > mm/internal.h | 18 ++ > mm/memory.c | 672 ++++++++++++++++++++++++++++------------------- > mm/mmap.c | 101 +++++-- > 8 files changed, 544 insertions(+), 308 deletions(-) > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@xxxxxxxxx. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a> > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>