On Mon, 27 Jan 2020 18:59:58 -0800 Arjun Roy <arjunroy.kdev@xxxxxxxxx> wrote: > Use vm_insert_pages() for tcp receive zerocopy. Spin lock cycles > (as reported by perf) drop from a couple of percentage points > to a fraction of a percent. This results in a roughly 6% increase in > efficiency, measured roughly as zerocopy receive count divided by CPU > utilization. > > The intention of this patch-set is to reduce atomic ops for > tcp zerocopy receives, which normally hits the same spinlock multiple > times consecutively. For some reason the patch causes this: In file included from ./arch/x86/include/asm/atomic.h:5:0, from ./include/linux/atomic.h:7, from ./include/linux/crypto.h:15, from ./include/crypto/hash.h:11, from net/ipv4/tcp.c:246: net/ipv4/tcp.c: In function ‘do_tcp_getsockopt.isra.29’: ./include/linux/compiler.h:225:31: warning: ‘tp’ may be used uninitialized in this function [-Wmaybe-uninitialized] case 4: *(volatile __u32 *)p = *(__u32 *)res; break; ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~ net/ipv4/tcp.c:1779:19: note: ‘tp’ was declared here struct tcp_sock *tp; ^~ It's a false positive. gcc-7.2.0 : out: : up_read(¤t->mm->mmap_sem); : if (length) { : WRITE_ONCE(tp->copied_seq, seq); but `length' is zero here. This suppresses it: --- a/net/ipv4/tcp.c~net-zerocopy-use-vm_insert_pages-for-tcp-rcv-zerocopy-fix +++ a/net/ipv4/tcp.c @@ -1788,6 +1788,8 @@ static int tcp_zerocopy_receive(struct s sock_rps_record_flow(sk); + tp = tcp_sk(sk); + down_read(¤t->mm->mmap_sem); ret = -EINVAL; @@ -1796,7 +1798,6 @@ static int tcp_zerocopy_receive(struct s goto out; zc->length = min_t(unsigned long, zc->length, vma->vm_end - address); - tp = tcp_sk(sk); seq = tp->copied_seq; inq = tcp_inq(sk); zc->length = min_t(u32, zc->length, inq); and I guess it's zero-cost. Anyway, I'll sit on this lot for a while, hoping for a davem ack?