Re: [PATCH v9 17/24] mm: Protect mm_rb tree with a rwlock

Peter Zijlstra <peterz@xxxxxxxxxxxxx> · Wed, 14 Mar 2018 09:48:44 +0100

On Tue, Mar 13, 2018 at 06:59:47PM +0100, Laurent Dufour wrote:
> This change is inspired by the Peter's proposal patch [1] which was
> protecting the VMA using SRCU. Unfortunately, SRCU is not scaling well in
> that particular case, and it is introducing major performance degradation
> due to excessive scheduling operations.

Do you happen to have a little more detail on that?

> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 34fde7111e88..28c763ea1036 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -335,6 +335,7 @@ struct vm_area_struct {
>  	struct vm_userfaultfd_ctx vm_userfaultfd_ctx;
>  #ifdef CONFIG_SPECULATIVE_PAGE_FAULT
>  	seqcount_t vm_sequence;
> +	atomic_t vm_ref_count;		/* see vma_get(), vma_put() */
>  #endif
>  } __randomize_layout;
>  
> @@ -353,6 +354,9 @@ struct kioctx_table;
>  struct mm_struct {
>  	struct vm_area_struct *mmap;		/* list of VMAs */
>  	struct rb_root mm_rb;
> +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
> +	rwlock_t mm_rb_lock;
> +#endif
>  	u32 vmacache_seqnum;                   /* per-thread vmacache */
>  #ifdef CONFIG_MMU
>  	unsigned long (*get_unmapped_area) (struct file *filp,

When I tried this, it simply traded contention on mmap_sem for
contention on these two cachelines.

This was for the concurrent fault benchmark, where mmap_sem is only ever
acquired for reading (so no blocking ever happens) and the bottle-neck
was really pure cacheline access.

Only by using RCU can you avoid that thrashing.

Also note that if your database allocates the one giant mapping, it'll
be _one_ VMA and that vm_ref_count gets _very_ hot indeed.