On Mon, Feb 28, 2022 at 12:06 PM Eric Dumazet <eric.dumazet@xxxxxxxxx> wrote: > > > On 2/25/22 01:02, Harold Huang wrote: > > In tun, NAPI is supported and we can also use NAPI in the path of > > batched XDP buffs to accelerate packet processing. What is more, after > > we use NAPI, GRO is also supported. The iperf shows that the throughput of > > single stream could be improved from 4.5Gbps to 9.2Gbps. Additionally, 9.2 > > Gbps nearly reachs the line speed of the phy nic and there is still about > > 15% idle cpu core remaining on the vhost thread. > > > > Test topology: > > > > [iperf server]<--->tap<--->dpdk testpmd<--->phy nic<--->[iperf client] > > > > Iperf stream: > > > > Before: > > ... > > [ 5] 5.00-6.00 sec 558 MBytes 4.68 Gbits/sec 0 1.50 MBytes > > [ 5] 6.00-7.00 sec 556 MBytes 4.67 Gbits/sec 1 1.35 MBytes > > [ 5] 7.00-8.00 sec 556 MBytes 4.67 Gbits/sec 2 1.18 MBytes > > [ 5] 8.00-9.00 sec 559 MBytes 4.69 Gbits/sec 0 1.48 MBytes > > [ 5] 9.00-10.00 sec 556 MBytes 4.67 Gbits/sec 1 1.33 MBytes > > - - - - - - - - - - - - - - - - - - - - - - - - - > > [ ID] Interval Transfer Bitrate Retr > > [ 5] 0.00-10.00 sec 5.39 GBytes 4.63 Gbits/sec 72 sender > > [ 5] 0.00-10.04 sec 5.39 GBytes 4.61 Gbits/sec receiver > > > > After: > > ... > > [ 5] 5.00-6.00 sec 1.07 GBytes 9.19 Gbits/sec 0 1.55 MBytes > > [ 5] 6.00-7.00 sec 1.08 GBytes 9.30 Gbits/sec 0 1.63 MBytes > > [ 5] 7.00-8.00 sec 1.08 GBytes 9.25 Gbits/sec 0 1.72 MBytes > > [ 5] 8.00-9.00 sec 1.08 GBytes 9.25 Gbits/sec 77 1.31 MBytes > > [ 5] 9.00-10.00 sec 1.08 GBytes 9.24 Gbits/sec 0 1.48 MBytes > > - - - - - - - - - - - - - - - - - - - - - - - - - > > [ ID] Interval Transfer Bitrate Retr > > [ 5] 0.00-10.00 sec 10.8 GBytes 9.28 Gbits/sec 166 sender > > [ 5] 0.00-10.04 sec 10.8 GBytes 9.24 Gbits/sec receiver > > .... > > > > Reported-at: https://lore.kernel.org/all/CACGkMEvTLG0Ayg+TtbN4q4pPW-ycgCCs3sC3-TF8cuRTf7Pp1A@xxxxxxxxxxxxxx > > Signed-off-by: Harold Huang <baymaxhuang@xxxxxxxxx> > > --- > > v1 -> v2 > > - fix commit messages > > - add queued flag to avoid void unnecessary napi suggested by Jason > > > > drivers/net/tun.c | 20 ++++++++++++++++---- > > 1 file changed, 16 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c > > index fed85447701a..c7d8b7c821d8 100644 > > --- a/drivers/net/tun.c > > +++ b/drivers/net/tun.c > > @@ -2379,7 +2379,7 @@ static void tun_put_page(struct tun_page *tpage) > > } > > > > static int tun_xdp_one(struct tun_struct *tun, > > - struct tun_file *tfile, > > + struct tun_file *tfile, int *queued, > > struct xdp_buff *xdp, int *flush, > > struct tun_page *tpage) > > { > > @@ -2388,6 +2388,7 @@ static int tun_xdp_one(struct tun_struct *tun, > > struct virtio_net_hdr *gso = &hdr->gso; > > struct bpf_prog *xdp_prog; > > struct sk_buff *skb = NULL; > > + struct sk_buff_head *queue; > > u32 rxhash = 0, act; > > int buflen = hdr->buflen; > > int err = 0; > > @@ -2464,7 +2465,15 @@ static int tun_xdp_one(struct tun_struct *tun, > > !tfile->detached) > > rxhash = __skb_get_hash_symmetric(skb); > > > > - netif_receive_skb(skb); > > + if (tfile->napi_enabled) { > > + queue = &tfile->sk.sk_write_queue; > > + spin_lock(&queue->lock); > > + __skb_queue_tail(queue, skb); > > + spin_unlock(&queue->lock); > > + (*queued)++; > > + } else { > > + netif_receive_skb(skb); > > + } > > > > /* No need to disable preemption here since this function is > > * always called with bh disabled > > @@ -2492,7 +2501,7 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len) > > if (ctl && (ctl->type == TUN_MSG_PTR)) { > > struct tun_page tpage; > > int n = ctl->num; > > - int flush = 0; > > + int flush = 0, queued = 0; > > > > memset(&tpage, 0, sizeof(tpage)); > > > > @@ -2501,12 +2510,15 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len) > > > > for (i = 0; i < n; i++) { > > xdp = &((struct xdp_buff *)ctl->ptr)[i]; > > - tun_xdp_one(tun, tfile, xdp, &flush, &tpage); > > + tun_xdp_one(tun, tfile, &queued, xdp, &flush, &tpage); > > > How big n can be ? > > BTW I could not find where m->msg_controllen was checked in tun_sendmsg(). > > struct tun_msg_ctl *ctl = m->msg_control; > > if (ctl && (ctl->type == TUN_MSG_PTR)) { > > int n = ctl->num; // can be set to values in [0..65535] > > for (i = 0; i < n; i++) { > > xdp = &((struct xdp_buff *)ctl->ptr)[i]; > > > I really do not understand how we prevent malicious user space from > crashing the kernel. It looks to me the only user for this is vhost-net which limits it to 64, userspace can't use sendmsg() directly on tap. Thanks > > > > > } > > > > if (flush) > > xdp_do_flush(); > > > > + if (tfile->napi_enabled && queued > 0) > > + napi_schedule(&tfile->napi); > > + > > rcu_read_unlock(); > > local_bh_enable(); > > >