On Mon, 26 Oct 2009 03:33:40 am Michael S. Tsirkin wrote: > virtio net used to unlink skbs from send queues on error, > but ever since 48925e372f04f5e35fec6269127c62b2c71ab794 > we do not do this. This causes guest data corruption and crashes > with vhost since net core can requeue the skb or free it without > it being taken off the list. > > This patch fixes this by queueing the skb after successfull > transmit. I originally thought that this was racy: as soon as we do add_buf, we need to make sure we're ready for the callback (for virtio_pci, it's ->kick, but we shouldn't rely on that). So a comment would be nice. How's this? Subject: virtio-net: fix data corruption with OOM Date: Sun, 25 Oct 2009 19:03:40 +0200 From: "Michael S. Tsirkin" <mst@xxxxxxxxxx> virtio net used to unlink skbs from send queues on error, but ever since 48925e372f04f5e35fec6269127c62b2c71ab794 we do not do this. This causes guest data corruption and crashes with vhost since net core can requeue the skb or free it without it being taken off the list. This patch fixes this by queueing the skb after successful transmit. Signed-off-by: Michael S. Tsirkin <mst@xxxxxxxxxx> Signed-off-by: Rusty Russell <rusty@xxxxxxxxxxxxxxx> (+ comment) --- Rusty, here's a fix for another data corrupter I saw. This fixes a regression from 2.6.31, so definitely 2.6.32 I think. Comments? drivers/net/virtio_net.c | 8 +++++--- 1 files changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -516,8 +516,7 @@ again: /* Free up any pending old buffers before queueing new ones. */ free_old_xmit_skbs(vi); - /* Put new one in send queue and do transmit */ - __skb_queue_head(&vi->send, skb); + /* Try to transmit */ capacity = xmit_skb(vi, skb); /* This can happen with OOM and indirect buffers. */ @@ -531,8 +530,17 @@ again: } return NETDEV_TX_BUSY; } + vi->svq->vq_ops->kick(vi->svq); - vi->svq->vq_ops->kick(vi->svq); + /* + * Put new one in send queue. You'd expect we'd need this before + * xmit_skb calls add_buf(), since the callback can be triggered + * immediately after that. But since the callback just triggers + * another call back here, normal network xmit locking prevents the + * race. + */ + __skb_queue_head(&vi->send, skb); + /* Don't wait up for transmitted skbs to be freed. */ skb_orphan(skb); nf_reset(skb); _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/virtualization