Re: [PATCH] vhost: Add polling mode

Abel Gordon <abel.gordon@xxxxxxxxx> · Wed, 23 Jul 2014 11:48:28 +0300

On Wed, Jul 23, 2014 at 11:42 AM, Jason Wang <jasowang@xxxxxxxxxx> wrote:
>
> On 07/23/2014 04:12 PM, Razya Ladelsky wrote:
> > Jason Wang <jasowang@xxxxxxxxxx> wrote on 23/07/2014 08:26:36 AM:
> >
> >> From: Jason Wang <jasowang@xxxxxxxxxx>
> >> To: Razya Ladelsky/Haifa/IBM@IBMIL, kvm@xxxxxxxxxxxxxxx, "Michael S.
> >> Tsirkin" <mst@xxxxxxxxxx>,
> >> Cc: abel.gordon@xxxxxxxxx, Joel Nider/Haifa/IBM@IBMIL, Yossi
> >> Kuperman1/Haifa/IBM@IBMIL, Eran Raichstein/Haifa/IBM@IBMIL, Alex
> >> Glikson/Haifa/IBM@IBMIL
> >> Date: 23/07/2014 08:26 AM
> >> Subject: Re: [PATCH] vhost: Add polling mode
> >>
> >> On 07/21/2014 09:23 PM, Razya Ladelsky wrote:
> >>> Hello All,
> >>>
> >>> When vhost is waiting for buffers from the guest driver (e.g., more
> >>> packets
> >>> to send in vhost-net's transmit queue), it normally goes to sleep and
> >>> waits
> >>> for the guest to "kick" it. This kick involves a PIO in the guest, and
> >>> therefore an exit (and possibly userspace involvement in translating
> > this
> >>> PIO
> >>> exit into a file descriptor event), all of which hurts performance.
> >>>
> >>> If the system is under-utilized (has cpu time to spare), vhost can
> >>> continuously poll the virtqueues for new buffers, and avoid asking
> >>> the guest to kick us.
> >>> This patch adds an optional polling mode to vhost, that can be enabled
> >>> via a kernel module parameter, "poll_start_rate".
> >>>
> >>> When polling is active for a virtqueue, the guest is asked to
> >>> disable notification (kicks), and the worker thread continuously
> > checks
> >>> for
> >>> new buffers. When it does discover new buffers, it simulates a "kick"
> > by
> >>> invoking the underlying backend driver (such as vhost-net), which
> > thinks
> >>> it
> >>> got a real kick from the guest, and acts accordingly. If the
> > underlying
> >>> driver asks not to be kicked, we disable polling on this virtqueue.
> >>>
> >>> We start polling on a virtqueue when we notice it has
> >>> work to do. Polling on this virtqueue is later disabled after 3
> > seconds of
> >>> polling turning up no new work, as in this case we are better off
> >>> returning
> >>> to the exit-based notification mechanism. The default timeout of 3
> > seconds
> >>> can be changed with the "poll_stop_idle" kernel module parameter.
> >>>
> >>> This polling approach makes lot of sense for new HW with
> > posted-interrupts
> >>> for which we have exitless host-to-guest notifications. But even with
> >>> support
> >>> for posted interrupts, guest-to-host communication still causes exits.
> >>> Polling adds the missing part.
> >>>
> >>> When systems are overloaded, there won?t be enough cpu time for the
> >>> various
> >>> vhost threads to poll their guests' devices. For these scenarios, we
> > plan
> >>> to add support for vhost threads that can be shared by multiple
> > devices,
> >>> even of multiple vms.
> >>> Our ultimate goal is to implement the I/O acceleration features
> > described
> >>> in:
> >>> KVM Forum 2013: Efficient and Scalable Virtio (by Abel Gordon)
> >>> https://www.youtube.com/watch?v=9EyweibHfEs
> >>> and
> >>> https://www.mail-archive.com/kvm@xxxxxxxxxxxxxxx/msg98179.html
> >>>
> >>>
> >>> Comments are welcome,
> >>> Thank you,
> >>> Razya
> >> Thanks for the work. Do you have perf numbers for this?
> >>
> > Hi Jason,
> > Thanks for reviewing. I ran some experiments with TCP stream netperf and
> > filebench (having 2 threads performing random reads) benchmarks on an IBM
> > System x3650 M4.
> > All runs loaded the guests in a way that they were (cpu) saturated.
> > The system had two cores per guest, as to allow for both the vcpu and the
> > vhost thread to
> > run concurrently for maximum throughput (but I didn't pin the threads to
> > specific cores)
> > I get:
> >
> > Netperf, 1 vm:
> > The polling patch improved throughput by ~33%. Number of exits/sec
> > decreased 6x.
> > The same improvement was shown when I tested with 3 vms running netperf.
> >
> > filebench, 1 vm:
> > ops/sec improved by 13% with the polling patch. Number of exits was
> > reduced by 31%.
> > The same experiment with 3 vms running filebench showed similar numbers.
>
> Looks good, may worth to add the result in the commit log.
> >
> >> And looks like the patch only poll for virtqueue. In the future, may
> >> worth to add callbacks for vhost_net to poll socket. Then it could be
> >> used with rx busy polling in host which may speedup the rx also.
> > Did you mean polling the network device to avoid interrupts?
>
> Yes, recent linux host support rx busy polling which can reduce the
> interrupts. If vhost can utilize this, it can also reduce the latency
> caused by vhost thread wakeups.
>
> And I'm also working on virtio-net busy polling in guest, if vhost can
> poll socket, it can also help in guest rx polling.

Nice :)  Note that you may want to check if if the processor support
posted interrupts. I guess that if CPU supports posted interrupts then
benefits of polling in the front-end (from performance perspective)
may not worth the cpu cycles wasted in the guest.

> >>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> >>> index c90f437..678d766 100644
> >>> --- a/drivers/vhost/vhost.c
> >>> +++ b/drivers/vhost/vhost.c
> >>> @@ -24,9 +24,17 @@
> >>>  #include <linux/slab.h>
> >>>  #include <linux/kthread.h>
> >>>  #include <linux/cgroup.h>
> >>> +#include <linux/jiffies.h>
> >>>  #include <linux/module.h>
> >>>
> >>>  #include "vhost.h"
> >>> +static int poll_start_rate = 0;
> >>> +module_param(poll_start_rate, int, S_IRUGO|S_IWUSR);
> >>> +MODULE_PARM_DESC(poll_start_rate, "Start continuous polling of
> > virtqueue
> >>> when rate of events is at least this number per jiffy. If 0, never
> > start
> >>> polling.");
> >>> +
> >>> +static int poll_stop_idle = 3*HZ; /* 3 seconds */
> >>> +module_param(poll_stop_idle, int, S_IRUGO|S_IWUSR);
> >>> +MODULE_PARM_DESC(poll_stop_idle, "Stop continuous polling of
> > virtqueue
> >>> after this many jiffies of no work.");
> >>>
> >> I'm not sure using jiffy is good enough since user need know HZ value.
> >> May worth to look at sk_busy_loop() which use sched_clock() and us.
> > Ok, Will look into it, thanks.
> >
> >>> +/* Enable or disable virtqueue polling (vqpoll.enabled) for a
> > virtqueue.
> >>> + *
> >>> + * Enabling this mode it tells the guest not to notify ("kick") us
> > when
> >>> its
> >>> + * has made more work available on this virtqueue; Rather, we will
> >>> continuously
> >>> + * poll this virtqueue in the worker thread. If multiple virtqueues
> > are
> >>> polled,
> >>> + * the worker thread polls them all, e.g., in a round-robin fashion.
> >>> + * Note that vqpoll.enabled doesn't always mean that this virtqueue
> > is
> >>> + * actually being polled: The backend (e.g., net.c) may temporarily
> >>> disable it
> >>> + * using vhost_disable/enable_notify(), while vqpoll.enabled is
> >>> unchanged.
> >>> + *
> >>> + * It is assumed that these functions are called relatively rarely,
> > when
> >>> vhost
> >>> + * notices that this virtqueue's usage pattern significantly changed
> > in a
> >>> way
> >>> + * that makes polling more efficient than notification, or vice
> > versa.
> >>> + * Also, we assume that vhost_vq_disable_vqpoll() is always called on
> > vq
> >>> + * cleanup, so any allocations done by vhost_vq_enable_vqpoll() can
> > be
> >>> + * reclaimed.
> >>> + */
> >>> +static void vhost_vq_enable_vqpoll(struct vhost_virtqueue *vq)
> >>> +{
> >>> +       if (vq->vqpoll.enabled)
> >>> +               return; /* already enabled, nothing to do */
> >>> +       if (!vq->handle_kick)
> >>> +               return; /* polling will be a waste of time if no
> > callback!
> >>> */
> >>> +       if (!(vq->used_flags & VRING_USED_F_NO_NOTIFY)) {
> >>> +               /* vq has guest notifications enabled. Disable them,
> >>> +                  and instead add vq to the polling list */
> >>>
> >>> +               list_add_tail(&vq->vqpoll.link,
> > &vq->dev->vqpoll_list);
> >> This may work when there're at most two vqs in the list. But consider
> >> you may want to poll a lot of vqs in the future, it may take a long time
> >> for this vq to get polled. So probably we can just keep the used_flags
> >> untouched, if the vq get kicked, it can be served soon.
> > Indeed there is a patch ready for polling multiple virtqueues, and it has
> > a better scheduling algorithm that avoids a virtqueue starvation.
>
> Ok.
> >
> >
> >>> +       }
> >>> +       vq->vqpoll.jiffies_last_kick = jiffies;
> >>> +       __get_user(vq->avail_idx, &vq->avail->idx);
> >>> +       vq->vqpoll.enabled = true;
> >>> +
> >>> +       /* Map userspace's vq->avail to the kernel's memory space. */
> >>> +       if (get_user_pages_fast((unsigned long)vq->avail, 1, 0,
> >>> +               &vq->vqpoll.avail_page) != 1) {
> >>> +               /* TODO: can this happen, as we check access
> >>> +               to vq->avail in advance? */
> >>> +               BUG();
> >>> +       }
> >>> +       vq->vqpoll.avail_mapped = (struct vring_avail *) (
> >>> +               (unsigned long)kmap(vq->vqpoll.avail_page) |
> >>> +               ((unsigned long)vq->avail & ~PAGE_MASK));
> >> Is it a must to map avail page here?
> >>
> > No. This is indeed in preparation for the next patch handling multiple
> > queues by a single vhost thread, where we'd like to map these pages for
> > performance.
>
> Make sense. Then for the gup failure, we may just safely disable the
> polling in this case.
> >
> >
> >
> >>> +                               vq->vqpoll.work_this_jiffy >=
> >>> +                                       poll_start_rate) {
> >>> +                               vhost_vq_enable_vqpoll(vq);
> >>> +                       }
> >>> +               }
> >>> +               /* Check one virtqueue from the round-robin list */
> >>> +               if (!list_empty(&dev->vqpoll_list)) {
> >> If we still have another work in work_list, we may want to serve it
> > first.
> >
> > You maybe right. We've done a lot of experiments with this method,
> > which seems to work well. I prefer leaving it this way for now, but your
> > approach
> > is worthwhile to investigate as well.
>
> Ok.
> >
> >> [...]
> >>>  struct vhost_dev {
> >>> @@ -123,6 +151,7 @@ struct vhost_dev {
> >>>         spinlock_t work_lock;
> >>>         struct list_head work_list;
> >>>         struct task_struct *worker;
> >>> +        struct list_head vqpoll_list;
> >>>  };
> >>>
> >>>  void vhost_dev_init(struct vhost_dev *, struct vhost_virtqueue **vqs,
> > int
> >>> nvqs);
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html