Re: virtio_net and SMP guests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Christian Borntraeger wrote:
Rusty, Anthony, Dor,

I need your brain power :-)

On smp guests I have seen a problem with virtio (the version in curent Avi's
git) which do not occur on single processor guests:

kernel BUG at /space/kvm/drivers/virtio/virtio_ring.c:228!
illegal operation: 0001 [#1]
Modules linked in: ipv6
CPU:    2    Not tainted
Process swapper (pid: 0, task: 000000000f83e038, ksp: 000000000f877d70)
Krnl PSW : 0704000180000000 000000000045df2a (vring_restart+0x5a/0x70)
           R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:0 PM:0 EA:3
Krnl GPRS: 00000000c0a80101 0000000000000000 000000000eb35000 0000000010005800
           000000000045ded0 000000000000192f 000000000eb21000 000000000eb21000
           000000000000000e 000000000eb21900 000000000eb21920 000000000f867cb8
           0700000000d9b058 0000000000000010 000000000045c06a 000000000f867cb8
Krnl Code: 000000000045df1e: e3b0b0700004       lg      %r11,112(%r11)
           000000000045df24: 07fe               bcr     15,%r14
           000000000045df26: a7f40001           brc     15,45df28
          >000000000045df2a: a7f4ffe1           brc     15,45deec
           000000000045df2e: e31020300004       lg      %r1,48(%r2)
           000000000045df34: a7480000           lhi     %r4,0
           000000000045df38: 96011001           oi      1(%r1),1
           000000000045df3c: a7f4ffef           brc     15,45df1a
Call Trace:
([<000000000045c016>] virtnet_poll+0x96/0x42c)
 [<000000000048cda2>] net_rx_action+0xca/0x150
 [<0000000000137f7a>] __do_softirq+0x9e/0x130
 [<00000000001105d6>] do_softirq+0xae/0xb4
 [<0000000000138182>] irq_exit+0x96/0x9c
 [<000000000010d710>] do_extint+0xcc/0xf8
 [<00000000001135d0>] ext_no_vtime+0x16/0x1a
 [<000000000010a57e>] cpu_idle+0x216/0x238


I think there is a valid code path, triggering this bug:

	CPU1						CPU2
-----------------------				-----------------------
- virtnet_poll found no
  more packets on queue
- netif_rx_complete allow
  poll to be called
- vq_ops->restart is called
- vq Interrupts are enabled	
	.		     <new packets arrive>
<vcpu is scheduled away>
	.					- interrupt is delivered
	.					- poll is called
	.					- poll work is done
	.					- netif_rx_complete
	.					- vq_ops->restart is called
	.					- check if vq interrupts are
	.					  enable --> BUG

I didn't understand how its possible:

<vcpu is scheduled away>
	.					- interrupt is delivered
-vring_interrupt is called -> - skb_recv_done callback return false ->
							vq->vring.avail->flags |= VRING_AVAIL_F_NO_INTERRUPT;

So when the restart callback will be called the BUG_ON(!(vq->vring.avail->flags & VRING_AVAIL_F_NO_INTERRUPT));
will not be issued.

	.					- poll is called
	.					- poll work is done
	.					- netif_rx_complete
	.					- vq_ops->restart is called
	.					- check if vq interrupts are
	.					  enable --> BUG


The first idea was to remove this check? (See patch below). I am not sure
if the proper fix also requires to change vring.avail->flags to be only
changed by atomic bitops. Any ideas, comments?
As for now no harm can be done since it is only used in two place:
1. vring_restart inside a napi poll calback which is protected by napi poll lock 2. vring_interrupt in the interrupt handler. While only the VRING_AVAIL_F_NO_INTERRUPT bit is touched there is no possible harm, once we'll use more bits it might be an issue.
So Maybe we should use atomics.
Signed-off-by: Christian Borntraeger <borntraeger@xxxxxxxxxx>
CC: Anthony Liguori <aliguori@xxxxxxxxxx>
CC: Dor Laor <dor.laor@xxxxxxxxxxxx>
CC: Rusty Russell <rusty@xxxxxxxxxxxxxxx>

---
 drivers/virtio/virtio_ring.c |    2 --
 1 file changed, 2 deletions(-)

Index: kvm/drivers/virtio/virtio_ring.c
===================================================================
--- kvm.orig/drivers/virtio/virtio_ring.c
+++ kvm/drivers/virtio/virtio_ring.c
@@ -225,8 +225,6 @@ static bool vring_restart(struct virtque
 	struct vring_virtqueue *vq = to_vvq(_vq);
START_USE(vq);
-	BUG_ON(!(vq->vring.avail->flags & VRING_AVAIL_F_NO_INTERRUPT));
-
 	/* We optimistically turn back on interrupts, then check if there was
 	 * more to do. */
 	vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;


_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linux-foundation.org/mailman/listinfo/virtualization

[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux