Re: [PATCH v2 9/9] tcp: Use per-vma locking for receive zerocopy

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Eric?  Arjun?  Any comments?

On Tue, Jul 11, 2023 at 09:20:47PM +0100, Matthew Wilcox (Oracle) wrote:
> From: Arjun Roy <arjunroy@xxxxxxxxxx>
> 
> Per-VMA locking allows us to lock a struct vm_area_struct without
> taking the process-wide mmap lock in read mode.
> 
> Consider a process workload where the mmap lock is taken constantly in
> write mode. In this scenario, all zerocopy receives are periodically
> blocked during that period of time - though in principle, the memory
> ranges being used by TCP are not touched by the operations that need
> the mmap write lock. This results in performance degradation.
> 
> Now consider another workload where the mmap lock is never taken in
> write mode, but there are many TCP connections using receive zerocopy
> that are concurrently receiving. These connections all take the mmap
> lock in read mode, but this does induce a lot of contention and atomic
> ops for this process-wide lock. This results in additional CPU
> overhead caused by contending on the cache line for this lock.
> 
> However, with per-vma locking, both of these problems can be avoided.
> 
> As a test, I ran an RPC-style request/response workload with 4KB
> payloads and receive zerocopy enabled, with 100 simultaneous TCP
> connections. I measured perf cycles within the
> find_tcp_vma/mmap_read_lock/mmap_read_unlock codepath, with and
> without per-vma locking enabled.
> 
> When using process-wide mmap semaphore read locking, about 1% of
> measured perf cycles were within this path. With per-VMA locking, this
> value dropped to about 0.45%.
> 
> Signed-off-by: Arjun Roy <arjunroy@xxxxxxxxxx>
> Reviewed-by: Eric Dumazet <edumazet@xxxxxxxxxx>
> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx>
> ---
>  net/ipv4/tcp.c | 39 ++++++++++++++++++++++++++++++++-------
>  1 file changed, 32 insertions(+), 7 deletions(-)
> 
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 1542de3f66f7..7118ec6cf886 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -2038,6 +2038,30 @@ static void tcp_zc_finalize_rx_tstamp(struct sock *sk,
>  	}
>  }
>  
> +static struct vm_area_struct *find_tcp_vma(struct mm_struct *mm,
> +		unsigned long address, bool *mmap_locked)
> +{
> +	struct vm_area_struct *vma = lock_vma_under_rcu(mm, address);
> +
> +	if (vma) {
> +		if (vma->vm_ops != &tcp_vm_ops) {
> +			vma_end_read(vma);
> +			return NULL;
> +		}
> +		*mmap_locked = false;
> +		return vma;
> +	}
> +
> +	mmap_read_lock(mm);
> +	vma = vma_lookup(mm, address);
> +	if (!vma || vma->vm_ops != &tcp_vm_ops) {
> +		mmap_read_unlock(mm);
> +		return NULL;
> +	}
> +	*mmap_locked = true;
> +	return vma;
> +}
> +
>  #define TCP_ZEROCOPY_PAGE_BATCH_SIZE 32
>  static int tcp_zerocopy_receive(struct sock *sk,
>  				struct tcp_zerocopy_receive *zc,
> @@ -2055,6 +2079,7 @@ static int tcp_zerocopy_receive(struct sock *sk,
>  	u32 seq = tp->copied_seq;
>  	u32 total_bytes_to_map;
>  	int inq = tcp_inq(sk);
> +	bool mmap_locked;
>  	int ret;
>  
>  	zc->copybuf_len = 0;
> @@ -2079,13 +2104,10 @@ static int tcp_zerocopy_receive(struct sock *sk,
>  		return 0;
>  	}
>  
> -	mmap_read_lock(current->mm);
> -
> -	vma = vma_lookup(current->mm, address);
> -	if (!vma || vma->vm_ops != &tcp_vm_ops) {
> -		mmap_read_unlock(current->mm);
> +	vma = find_tcp_vma(current->mm, address, &mmap_locked);
> +	if (!vma)
>  		return -EINVAL;
> -	}
> +
>  	vma_len = min_t(unsigned long, zc->length, vma->vm_end - address);
>  	avail_len = min_t(u32, vma_len, inq);
>  	total_bytes_to_map = avail_len & ~(PAGE_SIZE - 1);
> @@ -2159,7 +2181,10 @@ static int tcp_zerocopy_receive(struct sock *sk,
>  						   zc, total_bytes_to_map);
>  	}
>  out:
> -	mmap_read_unlock(current->mm);
> +	if (mmap_locked)
> +		mmap_read_unlock(current->mm);
> +	else
> +		vma_end_read(vma);
>  	/* Try to copy straggler data. */
>  	if (!ret)
>  		copylen = tcp_zc_handle_leftover(zc, sk, skb, &seq, copybuf_len, tss);
> -- 
> 2.39.2
> 




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux