Re: [RFC] [ver3 PATCH 0/6] Implement multiqueue virtio-net

"Michael S. Tsirkin" <mst@xxxxxxxxxx> · Sun, 13 Nov 2011 13:40:49 +0200

On Fri, Nov 11, 2011 at 06:32:23PM +0530, Krishna Kumar wrote:
> This patch series resurrects the earlier multiple TX/RX queues
> functionality for virtio_net, and addresses the issues pointed
> out.

Some general questions/issues with the approach this patchset takes:
1. Lack of host-guest synchronization for flow hash.
   On the host side, things will scale if the same vhost thread
   handles both transmit and receive for a specific flow.
   Further, things will scale
   if packets from distinct guest queues get routed to
   distict queues on the NIC and tap devices in the host.
   It seems that to achieve both, host and guest
   need to pass  the flow hash information to each other.
   Ben Hutchings suggested effectively pushing the guest's
   RFS socket map out to the host.  Any thoughts on this?
2. Reduced batching/increased number of exits.
   It's easy to see that the amount of work per VQ
   is reduced with this patch. Thus it's easy to imagine
   that under some workloads, where we previously had
   X packets per VM exit/interrupt, we'll now have X/N with N
   the number of virtqueues. Since both a VM exit and an interrupt
   are expensive operations, one wonders whether this can
   lead to performance regressions.
   It seems that to reduce the chance of such, some adaptive
   strategy would work better. But how would we ensure
   packets aren't reordered then? Any thoughts?
3. Lack of userspace resource control.
   A vhost-net device already uses quite a lot of resources.
   This patch seems to make the problem worse. At the moment,
   management can to some level control that by using a file
   descriptor per virtio device. So using a file descriptor
   per VQ has an advantage of limiting the amount of resources
   qemu can consume.  In April, Jason posted a qemu patch that supported
   a multiqueue guest by using existing vhost interfaces, by opening
   multiple devices, one per queue.
   It seems that this can be improved upon, if we allow e.g. sharing
   of memory maps between file descriptors.
   This might also make adaptive queueing strategies possible.
   Would it be possible to do this instead?

>  It also includes an API to share irq's, f.e.  amongst the
> TX vqs. 
> I plan to run TCP/UDP STREAM and RR tests for local->host and
> local->remote, and send the results in the next couple of days.

Please do. Small message throughput would be especially interesting.

> patch #1: Introduce VIRTIO_NET_F_MULTIQUEUE
> patch #2: Move 'num_queues' to virtqueue
> patch #3: virtio_net driver changes
> patch #4: vhost_net changes
> patch #5: Implement find_vqs_irq()
> patch #6: Convert virtio_net driver to use find_vqs_irq()
> 
> 
> 		Changes from rev2:
> Michael:
> -------
> 1. Added functions to handle setting RX/TX/CTRL vq's.
> 2. num_queue_pairs instead of numtxqs.
> 3. Experimental support for fewer irq's in find_vqs.
> 
> Rusty:
> ------
> 4. Cleaned up some existing "while (1)".
> 5. rvq/svq and rx_sg/tx_sg changed to vq and sg respectively.
> 6. Cleaned up some "#if 1" code.
> 
> 
> Issue when using patch5:
> -------------------------
> 
> The new API is designed to minimize code duplication.  E.g.
> vp_find_vqs() is implemented as:
> 
> static int vp_find_vqs(...)
> {
> 	return vp_find_vqs_irq(vdev, nvqs, vqs, callbacks, names, NULL);
> }
> 
> In my testing, when multiple tx/rx is used with multiple netperf
> sessions, all the device tx queues stops a few thousand times and
> subsequently woken up by skb_xmit_done.  But after some 40K-50K
> iterations of stop/wake, some of the txq's stop and no wake
> interrupt comes. (modprobe -r followed by modprobe solves this, so
> it is not a system hang).  At the time of the hang (#txqs=#rxqs=4):
> 
> # egrep "CPU|virtio0" /proc/interrupts | grep -v config
>        CPU0     CPU1     CPU2    CPU3
> 41:    49057    49262    48828   49421  PCI-MSI-edge    virtio0-input.0
> 42:    5066     5213     5221    5109   PCI-MSI-edge    virtio0-output.0
> 43:    43380    43770    43007   43148  PCI-MSI-edge    virtio0-input.1
> 44:    41433    41727    42101   41175  PCI-MSI-edge    virtio0-input.2
> 45:    38465    37629    38468   38768  PCI-MSI-edge    virtio0-input.3
> 
> # tc -s qdisc show dev eth0
> qdisc mq 0: root      
> 	Sent 393196939897 bytes 271191624 pkt (dropped 59897,
> 	overlimits 0 requeues 67156) backlog 25375720b 1601p
> 	requeues 67156  
> 
> I am not sure if patch #5 is responsible for the hang.  Also, without
> patch #5/patch #6, I changed vp_find_vqs() to:
> static int vp_find_vqs(...)
> {
> 	return vp_try_to_find_vqs(vdev, nvqs, vqs, callbacks, names,
> 				  false, false);
> }
> No packets were getting TX'd with this change when #txqs>1.  This is
> with the MQ-only patch that doesn't touch drivers/virtio/ directory.
> 
> Also, the MQ patch works reasonably well with 2 vectors - with
> use_msix=1 and per_vq_vectors=0 in vp_find_vqs().
> 
> Patch against net-next - please review.
> 
> Signed-off-by: krkumar2@xxxxxxxxxx
> ---
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html