Re: [PATCH v2 9/9] tcp: Use per-vma locking for receive zerocopy

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jul 11, 2023 at 1:21 PM Matthew Wilcox (Oracle)
<willy@xxxxxxxxxxxxx> wrote:
>
> From: Arjun Roy <arjunroy@xxxxxxxxxx>
>
> Per-VMA locking allows us to lock a struct vm_area_struct without
> taking the process-wide mmap lock in read mode.
>
> Consider a process workload where the mmap lock is taken constantly in
> write mode. In this scenario, all zerocopy receives are periodically
> blocked during that period of time - though in principle, the memory
> ranges being used by TCP are not touched by the operations that need
> the mmap write lock. This results in performance degradation.
>
> Now consider another workload where the mmap lock is never taken in
> write mode, but there are many TCP connections using receive zerocopy
> that are concurrently receiving. These connections all take the mmap
> lock in read mode, but this does induce a lot of contention and atomic
> ops for this process-wide lock. This results in additional CPU
> overhead caused by contending on the cache line for this lock.
>
> However, with per-vma locking, both of these problems can be avoided.
>
> As a test, I ran an RPC-style request/response workload with 4KB
> payloads and receive zerocopy enabled, with 100 simultaneous TCP
> connections. I measured perf cycles within the
> find_tcp_vma/mmap_read_lock/mmap_read_unlock codepath, with and
> without per-vma locking enabled.
>
> When using process-wide mmap semaphore read locking, about 1% of
> measured perf cycles were within this path. With per-VMA locking, this
> value dropped to about 0.45%.
>
> Signed-off-by: Arjun Roy <arjunroy@xxxxxxxxxx>
> Reviewed-by: Eric Dumazet <edumazet@xxxxxxxxxx>
> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx>

Seems to match the original version with less churn.

Reviewed-by: Suren Baghdasaryan <surenb@xxxxxxxxxx>

> ---
>  net/ipv4/tcp.c | 39 ++++++++++++++++++++++++++++++++-------
>  1 file changed, 32 insertions(+), 7 deletions(-)
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 1542de3f66f7..7118ec6cf886 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -2038,6 +2038,30 @@ static void tcp_zc_finalize_rx_tstamp(struct sock *sk,
>         }
>  }
>

nit: Maybe add a comment that mmap_locked value is
undefined/meaningless if the function returns NULL?

> +static struct vm_area_struct *find_tcp_vma(struct mm_struct *mm,
> +               unsigned long address, bool *mmap_locked)
> +{
> +       struct vm_area_struct *vma = lock_vma_under_rcu(mm, address);
> +
> +       if (vma) {
> +               if (vma->vm_ops != &tcp_vm_ops) {
> +                       vma_end_read(vma);
> +                       return NULL;
> +               }
> +               *mmap_locked = false;
> +               return vma;
> +       }
> +
> +       mmap_read_lock(mm);
> +       vma = vma_lookup(mm, address);
> +       if (!vma || vma->vm_ops != &tcp_vm_ops) {
> +               mmap_read_unlock(mm);
> +               return NULL;
> +       }
> +       *mmap_locked = true;
> +       return vma;
> +}
> +
>  #define TCP_ZEROCOPY_PAGE_BATCH_SIZE 32
>  static int tcp_zerocopy_receive(struct sock *sk,
>                                 struct tcp_zerocopy_receive *zc,
> @@ -2055,6 +2079,7 @@ static int tcp_zerocopy_receive(struct sock *sk,
>         u32 seq = tp->copied_seq;
>         u32 total_bytes_to_map;
>         int inq = tcp_inq(sk);
> +       bool mmap_locked;
>         int ret;
>
>         zc->copybuf_len = 0;
> @@ -2079,13 +2104,10 @@ static int tcp_zerocopy_receive(struct sock *sk,
>                 return 0;
>         }
>
> -       mmap_read_lock(current->mm);
> -
> -       vma = vma_lookup(current->mm, address);
> -       if (!vma || vma->vm_ops != &tcp_vm_ops) {
> -               mmap_read_unlock(current->mm);
> +       vma = find_tcp_vma(current->mm, address, &mmap_locked);
> +       if (!vma)
>                 return -EINVAL;
> -       }
> +
>         vma_len = min_t(unsigned long, zc->length, vma->vm_end - address);
>         avail_len = min_t(u32, vma_len, inq);
>         total_bytes_to_map = avail_len & ~(PAGE_SIZE - 1);
> @@ -2159,7 +2181,10 @@ static int tcp_zerocopy_receive(struct sock *sk,
>                                                    zc, total_bytes_to_map);
>         }
>  out:
> -       mmap_read_unlock(current->mm);
> +       if (mmap_locked)
> +               mmap_read_unlock(current->mm);
> +       else
> +               vma_end_read(vma);
>         /* Try to copy straggler data. */
>         if (!ret)
>                 copylen = tcp_zc_handle_leftover(zc, sk, skb, &seq, copybuf_len, tss);
> --
> 2.39.2
>




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux