Re: [PATCH net] virtio-net: suppress bad irq warning for tx napi

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2021/2/9 上午3:08, Willem de Bruijn wrote:
On Sun, Feb 7, 2021 at 10:29 PM Jason Wang <jasowang@xxxxxxxxxx> wrote:

On 2021/2/5 上午4:50, Willem de Bruijn wrote:
On Wed, Feb 3, 2021 at 10:06 PM Jason Wang <jasowang@xxxxxxxxxx> wrote:
On 2021/2/4 上午2:28, Willem de Bruijn wrote:
On Wed, Feb 3, 2021 at 12:33 AM Jason Wang <jasowang@xxxxxxxxxx> wrote:
On 2021/2/2 下午10:37, Willem de Bruijn wrote:
On Mon, Feb 1, 2021 at 10:09 PM Jason Wang <jasowang@xxxxxxxxxx> wrote:
On 2021/1/29 上午8:21, Wei Wang wrote:
With the implementation of napi-tx in virtio driver, we clean tx
descriptors from rx napi handler, for the purpose of reducing tx
complete interrupts. But this could introduce a race where tx complete
interrupt has been raised, but the handler found there is no work to do
because we have done the work in the previous rx interrupt handler.
This could lead to the following warning msg:
[ 3588.010778] irq 38: nobody cared (try booting with the
"irqpoll" option)
[ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
5.3.0-19-generic #20~18.04.2-Ubuntu
[ 3588.017940] Call Trace:
[ 3588.017942]  <IRQ>
[ 3588.017951]  dump_stack+0x63/0x85
[ 3588.017953]  __report_bad_irq+0x35/0xc0
[ 3588.017955]  note_interrupt+0x24b/0x2a0
[ 3588.017956]  handle_irq_event_percpu+0x54/0x80
[ 3588.017957]  handle_irq_event+0x3b/0x60
[ 3588.017958]  handle_edge_irq+0x83/0x1a0
[ 3588.017961]  handle_irq+0x20/0x30
[ 3588.017964]  do_IRQ+0x50/0xe0
[ 3588.017966]  common_interrupt+0xf/0xf
[ 3588.017966]  </IRQ>
[ 3588.017989] handlers:
[ 3588.020374] [<000000001b9f1da8>] vring_interrupt
[ 3588.025099] Disabling IRQ #38

This patch adds a new param to struct vring_virtqueue, and we set it for
tx virtqueues if napi-tx is enabled, to suppress the warning in such
case.

Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
Reported-by: Rick Jones <jonesrick@xxxxxxxxxx>
Signed-off-by: Wei Wang <weiwan@xxxxxxxxxx>
Signed-off-by: Willem de Bruijn <willemb@xxxxxxxxxx>
Please use get_maintainer.pl to make sure Michael and me were cced.
Will do. Sorry about that. I suggested just the virtualization list, my bad.

---
      drivers/net/virtio_net.c     | 19 ++++++++++++++-----
      drivers/virtio/virtio_ring.c | 16 ++++++++++++++++
      include/linux/virtio.h       |  2 ++
      3 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 508408fbe78f..e9a3f30864e8 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1303,13 +1303,22 @@ static void virtnet_napi_tx_enable(struct virtnet_info *vi,
                  return;
          }

+     /* With napi_tx enabled, free_old_xmit_skbs() could be called from
+      * rx napi handler. Set work_steal to suppress bad irq warning for
+      * IRQ_NONE case from tx complete interrupt handler.
+      */
+     virtqueue_set_work_steal(vq, true);
+
          return virtnet_napi_enable(vq, napi);
Do we need to force the ordering between steal set and napi enable?
The warning only occurs after one hundred spurious interrupts, so not
really.
Ok, so it looks like a hint. Then I wonder how much value do we need to
introduce helper like virtqueue_set_work_steal() that allows the caller
to toggle. How about disable the check forever during virtqueue
initialization?
Yes, that is even simpler.

We still need the helper, as the internal variables of vring_virtqueue
are not accessible from virtio-net. An earlier patch added the
variable to virtqueue itself, but I think it belongs in
vring_virtqueue. And the helper is not a lot of code.
It's better to do this before the allocating the irq. But it looks not
easy unless we extend find_vqs().
Can you elaborate why that is better? At virtnet_open the interrupts
are not firing either.

I think you meant NAPI actually?
I meant interrupt: we don't have to worry about the spurious interrupt
warning when no interrupts will be firing. Until virtnet_open
completes, the device is down.


Ok.




I have no preference. Just curious, especially if it complicates the patch.

My understanding is that. It's probably ok for net. But we probably need
to document the assumptions to make sure it was not abused in other drivers.

Introduce new parameters for find_vqs() can help to eliminate the subtle
stuffs but I agree it looks like a overkill.

(Btw, I forget the numbers but wonder how much difference if we simple
remove the free_old_xmits() from the rx NAPI path?)
The committed patchset did not record those numbers, but I found them
in an earlier iteration:

   [PATCH net-next 0/3] virtio-net tx napi
   https://lists.openwall.net/netdev/2017/04/02/55

It did seem to significantly reduce compute cycles ("Gcyc") at the
time. For instance:

     TCP_RR Latency (us):
     1x:
       p50              24       24       21
       p99              27       27       27
       Gcycles         299      432      308

I'm concerned that removing it now may cause a regression report in a
few months. That is higher risk than the spurious interrupt warning
that was only reported after years of use.


Right.

So if Michael is fine with this approach, I'm ok with it. But we probably need to a TODO to invent the interrupt handlers that can be used for more than one virtqueues. When MSI-X is enabled, the interrupt handler (vring_interrup()) assumes the interrupt is used by a single virtqueue.

Thanks




_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization




[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux