On Wed, Jul 23, 2014 at 11:42 AM, Jason Wang <jasowang@xxxxxxxxxx> wrote: > > On 07/23/2014 04:12 PM, Razya Ladelsky wrote: > > Jason Wang <jasowang@xxxxxxxxxx> wrote on 23/07/2014 08:26:36 AM: > > > >> From: Jason Wang <jasowang@xxxxxxxxxx> > >> To: Razya Ladelsky/Haifa/IBM@IBMIL, kvm@xxxxxxxxxxxxxxx, "Michael S. > >> Tsirkin" <mst@xxxxxxxxxx>, > >> Cc: abel.gordon@xxxxxxxxx, Joel Nider/Haifa/IBM@IBMIL, Yossi > >> Kuperman1/Haifa/IBM@IBMIL, Eran Raichstein/Haifa/IBM@IBMIL, Alex > >> Glikson/Haifa/IBM@IBMIL > >> Date: 23/07/2014 08:26 AM > >> Subject: Re: [PATCH] vhost: Add polling mode > >> > >> On 07/21/2014 09:23 PM, Razya Ladelsky wrote: > >>> Hello All, > >>> > >>> When vhost is waiting for buffers from the guest driver (e.g., more > >>> packets > >>> to send in vhost-net's transmit queue), it normally goes to sleep and > >>> waits > >>> for the guest to "kick" it. This kick involves a PIO in the guest, and > >>> therefore an exit (and possibly userspace involvement in translating > > this > >>> PIO > >>> exit into a file descriptor event), all of which hurts performance. > >>> > >>> If the system is under-utilized (has cpu time to spare), vhost can > >>> continuously poll the virtqueues for new buffers, and avoid asking > >>> the guest to kick us. > >>> This patch adds an optional polling mode to vhost, that can be enabled > >>> via a kernel module parameter, "poll_start_rate". > >>> > >>> When polling is active for a virtqueue, the guest is asked to > >>> disable notification (kicks), and the worker thread continuously > > checks > >>> for > >>> new buffers. When it does discover new buffers, it simulates a "kick" > > by > >>> invoking the underlying backend driver (such as vhost-net), which > > thinks > >>> it > >>> got a real kick from the guest, and acts accordingly. If the > > underlying > >>> driver asks not to be kicked, we disable polling on this virtqueue. > >>> > >>> We start polling on a virtqueue when we notice it has > >>> work to do. Polling on this virtqueue is later disabled after 3 > > seconds of > >>> polling turning up no new work, as in this case we are better off > >>> returning > >>> to the exit-based notification mechanism. The default timeout of 3 > > seconds > >>> can be changed with the "poll_stop_idle" kernel module parameter. > >>> > >>> This polling approach makes lot of sense for new HW with > > posted-interrupts > >>> for which we have exitless host-to-guest notifications. But even with > >>> support > >>> for posted interrupts, guest-to-host communication still causes exits. > >>> Polling adds the missing part. > >>> > >>> When systems are overloaded, there won?t be enough cpu time for the > >>> various > >>> vhost threads to poll their guests' devices. For these scenarios, we > > plan > >>> to add support for vhost threads that can be shared by multiple > > devices, > >>> even of multiple vms. > >>> Our ultimate goal is to implement the I/O acceleration features > > described > >>> in: > >>> KVM Forum 2013: Efficient and Scalable Virtio (by Abel Gordon) > >>> https://www.youtube.com/watch?v=9EyweibHfEs > >>> and > >>> https://www.mail-archive.com/kvm@xxxxxxxxxxxxxxx/msg98179.html > >>> > >>> > >>> Comments are welcome, > >>> Thank you, > >>> Razya > >> Thanks for the work. Do you have perf numbers for this? > >> > > Hi Jason, > > Thanks for reviewing. I ran some experiments with TCP stream netperf and > > filebench (having 2 threads performing random reads) benchmarks on an IBM > > System x3650 M4. > > All runs loaded the guests in a way that they were (cpu) saturated. > > The system had two cores per guest, as to allow for both the vcpu and the > > vhost thread to > > run concurrently for maximum throughput (but I didn't pin the threads to > > specific cores) > > I get: > > > > Netperf, 1 vm: > > The polling patch improved throughput by ~33%. Number of exits/sec > > decreased 6x. > > The same improvement was shown when I tested with 3 vms running netperf. > > > > filebench, 1 vm: > > ops/sec improved by 13% with the polling patch. Number of exits was > > reduced by 31%. > > The same experiment with 3 vms running filebench showed similar numbers. > > Looks good, may worth to add the result in the commit log. > > > >> And looks like the patch only poll for virtqueue. In the future, may > >> worth to add callbacks for vhost_net to poll socket. Then it could be > >> used with rx busy polling in host which may speedup the rx also. > > Did you mean polling the network device to avoid interrupts? > > Yes, recent linux host support rx busy polling which can reduce the > interrupts. If vhost can utilize this, it can also reduce the latency > caused by vhost thread wakeups. > > And I'm also working on virtio-net busy polling in guest, if vhost can > poll socket, it can also help in guest rx polling. Nice :) Note that you may want to check if if the processor support posted interrupts. I guess that if CPU supports posted interrupts then benefits of polling in the front-end (from performance perspective) may not worth the cpu cycles wasted in the guest. > >>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > >>> index c90f437..678d766 100644 > >>> --- a/drivers/vhost/vhost.c > >>> +++ b/drivers/vhost/vhost.c > >>> @@ -24,9 +24,17 @@ > >>> #include <linux/slab.h> > >>> #include <linux/kthread.h> > >>> #include <linux/cgroup.h> > >>> +#include <linux/jiffies.h> > >>> #include <linux/module.h> > >>> > >>> #include "vhost.h" > >>> +static int poll_start_rate = 0; > >>> +module_param(poll_start_rate, int, S_IRUGO|S_IWUSR); > >>> +MODULE_PARM_DESC(poll_start_rate, "Start continuous polling of > > virtqueue > >>> when rate of events is at least this number per jiffy. If 0, never > > start > >>> polling."); > >>> + > >>> +static int poll_stop_idle = 3*HZ; /* 3 seconds */ > >>> +module_param(poll_stop_idle, int, S_IRUGO|S_IWUSR); > >>> +MODULE_PARM_DESC(poll_stop_idle, "Stop continuous polling of > > virtqueue > >>> after this many jiffies of no work."); > >>> > >> I'm not sure using jiffy is good enough since user need know HZ value. > >> May worth to look at sk_busy_loop() which use sched_clock() and us. > > Ok, Will look into it, thanks. > > > >>> +/* Enable or disable virtqueue polling (vqpoll.enabled) for a > > virtqueue. > >>> + * > >>> + * Enabling this mode it tells the guest not to notify ("kick") us > > when > >>> its > >>> + * has made more work available on this virtqueue; Rather, we will > >>> continuously > >>> + * poll this virtqueue in the worker thread. If multiple virtqueues > > are > >>> polled, > >>> + * the worker thread polls them all, e.g., in a round-robin fashion. > >>> + * Note that vqpoll.enabled doesn't always mean that this virtqueue > > is > >>> + * actually being polled: The backend (e.g., net.c) may temporarily > >>> disable it > >>> + * using vhost_disable/enable_notify(), while vqpoll.enabled is > >>> unchanged. > >>> + * > >>> + * It is assumed that these functions are called relatively rarely, > > when > >>> vhost > >>> + * notices that this virtqueue's usage pattern significantly changed > > in a > >>> way > >>> + * that makes polling more efficient than notification, or vice > > versa. > >>> + * Also, we assume that vhost_vq_disable_vqpoll() is always called on > > vq > >>> + * cleanup, so any allocations done by vhost_vq_enable_vqpoll() can > > be > >>> + * reclaimed. > >>> + */ > >>> +static void vhost_vq_enable_vqpoll(struct vhost_virtqueue *vq) > >>> +{ > >>> + if (vq->vqpoll.enabled) > >>> + return; /* already enabled, nothing to do */ > >>> + if (!vq->handle_kick) > >>> + return; /* polling will be a waste of time if no > > callback! > >>> */ > >>> + if (!(vq->used_flags & VRING_USED_F_NO_NOTIFY)) { > >>> + /* vq has guest notifications enabled. Disable them, > >>> + and instead add vq to the polling list */ > >>> > >>> + list_add_tail(&vq->vqpoll.link, > > &vq->dev->vqpoll_list); > >> This may work when there're at most two vqs in the list. But consider > >> you may want to poll a lot of vqs in the future, it may take a long time > >> for this vq to get polled. So probably we can just keep the used_flags > >> untouched, if the vq get kicked, it can be served soon. > > Indeed there is a patch ready for polling multiple virtqueues, and it has > > a better scheduling algorithm that avoids a virtqueue starvation. > > Ok. > > > > > >>> + } > >>> + vq->vqpoll.jiffies_last_kick = jiffies; > >>> + __get_user(vq->avail_idx, &vq->avail->idx); > >>> + vq->vqpoll.enabled = true; > >>> + > >>> + /* Map userspace's vq->avail to the kernel's memory space. */ > >>> + if (get_user_pages_fast((unsigned long)vq->avail, 1, 0, > >>> + &vq->vqpoll.avail_page) != 1) { > >>> + /* TODO: can this happen, as we check access > >>> + to vq->avail in advance? */ > >>> + BUG(); > >>> + } > >>> + vq->vqpoll.avail_mapped = (struct vring_avail *) ( > >>> + (unsigned long)kmap(vq->vqpoll.avail_page) | > >>> + ((unsigned long)vq->avail & ~PAGE_MASK)); > >> Is it a must to map avail page here? > >> > > No. This is indeed in preparation for the next patch handling multiple > > queues by a single vhost thread, where we'd like to map these pages for > > performance. > > Make sense. Then for the gup failure, we may just safely disable the > polling in this case. > > > > > > > >>> + vq->vqpoll.work_this_jiffy >= > >>> + poll_start_rate) { > >>> + vhost_vq_enable_vqpoll(vq); > >>> + } > >>> + } > >>> + /* Check one virtqueue from the round-robin list */ > >>> + if (!list_empty(&dev->vqpoll_list)) { > >> If we still have another work in work_list, we may want to serve it > > first. > > > > You maybe right. We've done a lot of experiments with this method, > > which seems to work well. I prefer leaving it this way for now, but your > > approach > > is worthwhile to investigate as well. > > Ok. > > > >> [...] > >>> struct vhost_dev { > >>> @@ -123,6 +151,7 @@ struct vhost_dev { > >>> spinlock_t work_lock; > >>> struct list_head work_list; > >>> struct task_struct *worker; > >>> + struct list_head vqpoll_list; > >>> }; > >>> > >>> void vhost_dev_init(struct vhost_dev *, struct vhost_virtqueue **vqs, > > int > >>> nvqs); > > -- > > To unsubscribe from this list: send the line "unsubscribe kvm" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html