Christian Borntraeger wrote:
Rusty, Anthony, Dor,
I need your brain power :-)
On smp guests I have seen a problem with virtio (the version in curent Avi's
git) which do not occur on single processor guests:
kernel BUG at /space/kvm/drivers/virtio/virtio_ring.c:228!
illegal operation: 0001 [#1]
Modules linked in: ipv6
CPU: 2 Not tainted
Process swapper (pid: 0, task: 000000000f83e038, ksp: 000000000f877d70)
Krnl PSW : 0704000180000000 000000000045df2a (vring_restart+0x5a/0x70)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:0 PM:0 EA:3
Krnl GPRS: 00000000c0a80101 0000000000000000 000000000eb35000 0000000010005800
000000000045ded0 000000000000192f 000000000eb21000 000000000eb21000
000000000000000e 000000000eb21900 000000000eb21920 000000000f867cb8
0700000000d9b058 0000000000000010 000000000045c06a 000000000f867cb8
Krnl Code: 000000000045df1e: e3b0b0700004 lg %r11,112(%r11)
000000000045df24: 07fe bcr 15,%r14
000000000045df26: a7f40001 brc 15,45df28
>000000000045df2a: a7f4ffe1 brc 15,45deec
000000000045df2e: e31020300004 lg %r1,48(%r2)
000000000045df34: a7480000 lhi %r4,0
000000000045df38: 96011001 oi 1(%r1),1
000000000045df3c: a7f4ffef brc 15,45df1a
Call Trace:
([<000000000045c016>] virtnet_poll+0x96/0x42c)
[<000000000048cda2>] net_rx_action+0xca/0x150
[<0000000000137f7a>] __do_softirq+0x9e/0x130
[<00000000001105d6>] do_softirq+0xae/0xb4
[<0000000000138182>] irq_exit+0x96/0x9c
[<000000000010d710>] do_extint+0xcc/0xf8
[<00000000001135d0>] ext_no_vtime+0x16/0x1a
[<000000000010a57e>] cpu_idle+0x216/0x238
I think there is a valid code path, triggering this bug:
CPU1 CPU2
----------------------- -----------------------
- virtnet_poll found no
more packets on queue
- netif_rx_complete allow
poll to be called
- vq_ops->restart is called
- vq Interrupts are enabled
. <new packets arrive>
<vcpu is scheduled away>
. - interrupt is delivered
. - poll is called
. - poll work is done
. - netif_rx_complete
. - vq_ops->restart is called
. - check if vq interrupts are
. enable --> BUG
I didn't understand how its possible:
<vcpu is scheduled away>
. - interrupt is delivered
-vring_interrupt is called ->
- skb_recv_done callback return false ->
vq->vring.avail->flags |= VRING_AVAIL_F_NO_INTERRUPT;
So when the restart callback will be called the
BUG_ON(!(vq->vring.avail->flags & VRING_AVAIL_F_NO_INTERRUPT));
will not be issued.
. - poll is called
. - poll work is done
. - netif_rx_complete
. - vq_ops->restart is called
. - check if vq interrupts are
. enable --> BUG
The first idea was to remove this check? (See patch below). I am not sure
if the proper fix also requires to change vring.avail->flags to be only
changed by atomic bitops. Any ideas, comments?
As for now no harm can be done since it is only used in two place:
1. vring_restart inside a napi poll calback which is protected by napi
poll lock
2. vring_interrupt in the interrupt handler. While only the
VRING_AVAIL_F_NO_INTERRUPT bit is touched
there is no possible harm, once we'll use more bits it might be an
issue.
So Maybe we should use atomics.
Signed-off-by: Christian Borntraeger <borntraeger@xxxxxxxxxx>
CC: Anthony Liguori <aliguori@xxxxxxxxxx>
CC: Dor Laor <dor.laor@xxxxxxxxxxxx>
CC: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
---
drivers/virtio/virtio_ring.c | 2 --
1 file changed, 2 deletions(-)
Index: kvm/drivers/virtio/virtio_ring.c
===================================================================
--- kvm.orig/drivers/virtio/virtio_ring.c
+++ kvm/drivers/virtio/virtio_ring.c
@@ -225,8 +225,6 @@ static bool vring_restart(struct virtque
struct vring_virtqueue *vq = to_vvq(_vq);
START_USE(vq);
- BUG_ON(!(vq->vring.avail->flags & VRING_AVAIL_F_NO_INTERRUPT));
-
/* We optimistically turn back on interrupts, then check if there was
* more to do. */
vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linux-foundation.org/mailman/listinfo/virtualization