Re: [PATCH net-next v2 2/5] virtio-net: transmit napi

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2017年04月20日 21:58, Willem de Bruijn wrote:
On Thu, Apr 20, 2017 at 2:27 AM, Jason Wang <jasowang@xxxxxxxxxx> wrote:

On 2017年04月19日 04:21, Willem de Bruijn wrote:
+static void virtnet_napi_tx_enable(struct virtnet_info *vi,
+                                  struct virtqueue *vq,
+                                  struct napi_struct *napi)
+{
+       if (!napi->weight)
+               return;
+
+       if (!vi->affinity_hint_set) {
+               napi->weight = 0;
+               return;
+       }
+
+       return virtnet_napi_enable(vq, napi);
+}
+
   static void refill_work(struct work_struct *work)

Maybe I was wrong, but according to Michael's comment it looks like he want
check affinity_hint_set just for speculative tx polling on rx napi instead
of disabling it at all.

And I'm not convinced this is really needed, driver only provide affinity
hint instead of affinity, so it's not guaranteed that tx and rx interrupt
are in the same vcpus.
You're right. I made the restriction broader than the request, to really err
on the side of caution for the initial merge of napi tx. And enabling
the optimization is always a win over keeping it off, even without irq
affinity.

The cycle cost is significant without affinity regardless of whether the
optimization is used.

Yes, I noticed this in the past too.

Though this is not limited to napi-tx, it is more
pronounced in that mode than without napi.

1x TCP_RR for affinity configuration {process, rx_irq, tx_irq}:

upstream:

1,1,1: 28985 Mbps, 278 Gcyc
1,0,2: 30067 Mbps, 402 Gcyc

napi tx:

1,1,1: 34492 Mbps, 269 Gcyc
1,0,2: 36527 Mbps, 537 Gcyc (!)
1,0,1: 36269 Mbps, 394 Gcyc
1,0,0: 34674 Mbps, 402 Gcyc

This is a particularly strong example. It is also representative
of most RR tests. It is less pronounced in other streaming tests.
10x TCP_RR, for instance:

upstream:

1,1,1: 42267 Mbps, 301 Gcyc
1,0,2: 40663 Mbps, 445 Gcyc

napi tx:

1,1,1: 42420 Mbps, 303 Gcyc
1,0,2:  42267 Mbps, 431 Gcyc

These numbers were obtained with the virtqueue_enable_cb_delayed
optimization after xmit_skb, btw. It turns out that moving that before
increases 1x TCP_RR further to ~39 Gbps, at the cost of reducing
100x TCP_RR a bit.

I see, so I think we can leave the affinity hint optimization/check for future investigation:

- to avoid endless optimization (e.g we may want to share a single vector/napi for tx/rx queue pairs in the future) for this series.
- tx napi is disabled by default which means we can do optimization on top.

Thanks
_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization




[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux