Re: [PATCH] vhost: Add polling mode

Razya Ladelsky <RAZYA@xxxxxxxxxx> · Wed, 23 Jul 2014 11:12:54 +0300

Jason Wang <jasowang@xxxxxxxxxx> wrote on 23/07/2014 08:26:36 AM:

> From: Jason Wang <jasowang@xxxxxxxxxx>
> To: Razya Ladelsky/Haifa/IBM@IBMIL, kvm@xxxxxxxxxxxxxxx, "Michael S.
> Tsirkin" <mst@xxxxxxxxxx>, 
> Cc: abel.gordon@xxxxxxxxx, Joel Nider/Haifa/IBM@IBMIL, Yossi 
> Kuperman1/Haifa/IBM@IBMIL, Eran Raichstein/Haifa/IBM@IBMIL, Alex 
> Glikson/Haifa/IBM@IBMIL
> Date: 23/07/2014 08:26 AM
> Subject: Re: [PATCH] vhost: Add polling mode
> 
> On 07/21/2014 09:23 PM, Razya Ladelsky wrote:
> > Hello All,
> >
> > When vhost is waiting for buffers from the guest driver (e.g., more 
> > packets
> > to send in vhost-net's transmit queue), it normally goes to sleep and 
> > waits
> > for the guest to "kick" it. This kick involves a PIO in the guest, and
> > therefore an exit (and possibly userspace involvement in translating 
this 
> > PIO
> > exit into a file descriptor event), all of which hurts performance.
> >
> > If the system is under-utilized (has cpu time to spare), vhost can 
> > continuously poll the virtqueues for new buffers, and avoid asking 
> > the guest to kick us.
> > This patch adds an optional polling mode to vhost, that can be enabled 

> > via a kernel module parameter, "poll_start_rate".
> >
> > When polling is active for a virtqueue, the guest is asked to
> > disable notification (kicks), and the worker thread continuously 
checks 
> > for
> > new buffers. When it does discover new buffers, it simulates a "kick" 
by
> > invoking the underlying backend driver (such as vhost-net), which 
thinks 
> > it
> > got a real kick from the guest, and acts accordingly. If the 
underlying
> > driver asks not to be kicked, we disable polling on this virtqueue.
> >
> > We start polling on a virtqueue when we notice it has
> > work to do. Polling on this virtqueue is later disabled after 3 
seconds of
> > polling turning up no new work, as in this case we are better off 
> > returning
> > to the exit-based notification mechanism. The default timeout of 3 
seconds
> > can be changed with the "poll_stop_idle" kernel module parameter.
> >
> > This polling approach makes lot of sense for new HW with 
posted-interrupts
> > for which we have exitless host-to-guest notifications. But even with 
> > support 
> > for posted interrupts, guest-to-host communication still causes exits. 

> > Polling adds the missing part.
> >
> > When systems are overloaded, there won?t be enough cpu time for the 
> > various 
> > vhost threads to poll their guests' devices. For these scenarios, we 
plan 
> > to add support for vhost threads that can be shared by multiple 
devices, 
> > even of multiple vms. 
> > Our ultimate goal is to implement the I/O acceleration features 
described 
> > in:
> > KVM Forum 2013: Efficient and Scalable Virtio (by Abel Gordon) 
> > https://www.youtube.com/watch?v=9EyweibHfEs
> > and
> > https://www.mail-archive.com/kvm@xxxxxxxxxxxxxxx/msg98179.html
> >
> > 
> > Comments are welcome, 
> > Thank you,
> > Razya
> 
> Thanks for the work. Do you have perf numbers for this?
> 

Hi Jason,
Thanks for reviewing. I ran some experiments with TCP stream netperf and 
filebench (having 2 threads performing random reads) benchmarks on an IBM 
System x3650 M4.
All runs loaded the guests in a way that they were (cpu) saturated.
The system had two cores per guest, as to allow for both the vcpu and the 
vhost thread to
run concurrently for maximum throughput (but I didn't pin the threads to 
specific cores)
I get:

Netperf, 1 vm:
The polling patch improved throughput by ~33%. Number of exits/sec 
decreased 6x.
The same improvement was shown when I tested with 3 vms running netperf.

filebench, 1 vm:
ops/sec improved by 13% with the polling patch. Number of exits was 
reduced by 31%.
The same experiment with 3 vms running filebench showed similar numbers.

> And looks like the patch only poll for virtqueue. In the future, may
> worth to add callbacks for vhost_net to poll socket. Then it could be
> used with rx busy polling in host which may speedup the rx also.

Did you mean polling the network device to avoid interrupts?

> > 
> > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > index c90f437..678d766 100644
> > --- a/drivers/vhost/vhost.c
> > +++ b/drivers/vhost/vhost.c
> > @@ -24,9 +24,17 @@
> >  #include <linux/slab.h>
> >  #include <linux/kthread.h>
> >  #include <linux/cgroup.h>
> > +#include <linux/jiffies.h>
> >  #include <linux/module.h>
> > 
> >  #include "vhost.h"
> > +static int poll_start_rate = 0;
> > +module_param(poll_start_rate, int, S_IRUGO|S_IWUSR);
> > +MODULE_PARM_DESC(poll_start_rate, "Start continuous polling of 
virtqueue 
> > when rate of events is at least this number per jiffy. If 0, never 
start 
> > polling.");
> > +
> > +static int poll_stop_idle = 3*HZ; /* 3 seconds */
> > +module_param(poll_stop_idle, int, S_IRUGO|S_IWUSR);
> > +MODULE_PARM_DESC(poll_stop_idle, "Stop continuous polling of 
virtqueue 
> > after this many jiffies of no work.");
> > 
> 
> I'm not sure using jiffy is good enough since user need know HZ value.
> May worth to look at sk_busy_loop() which use sched_clock() and us. 

Ok, Will look into it, thanks.

> > 
> > +/* Enable or disable virtqueue polling (vqpoll.enabled) for a 
virtqueue.
> > + *
> > + * Enabling this mode it tells the guest not to notify ("kick") us 
when 
> > its
> > + * has made more work available on this virtqueue; Rather, we will 
> > continuously
> > + * poll this virtqueue in the worker thread. If multiple virtqueues 
are 
> > polled,
> > + * the worker thread polls them all, e.g., in a round-robin fashion.
> > + * Note that vqpoll.enabled doesn't always mean that this virtqueue 
is
> > + * actually being polled: The backend (e.g., net.c) may temporarily 
> > disable it
> > + * using vhost_disable/enable_notify(), while vqpoll.enabled is 
> > unchanged.
> > + *
> > + * It is assumed that these functions are called relatively rarely, 
when 
> > vhost
> > + * notices that this virtqueue's usage pattern significantly changed 
in a 
> > way
> > + * that makes polling more efficient than notification, or vice 
versa.
> > + * Also, we assume that vhost_vq_disable_vqpoll() is always called on 
vq
> > + * cleanup, so any allocations done by vhost_vq_enable_vqpoll() can 
be
> > + * reclaimed.
> > + */
> > +static void vhost_vq_enable_vqpoll(struct vhost_virtqueue *vq)
> > +{
> > +       if (vq->vqpoll.enabled)
> > +               return; /* already enabled, nothing to do */
> > +       if (!vq->handle_kick)
> > +               return; /* polling will be a waste of time if no 
callback! 
> > */
> > +       if (!(vq->used_flags & VRING_USED_F_NO_NOTIFY)) {
> > +               /* vq has guest notifications enabled. Disable them,
> > +                  and instead add vq to the polling list */
> >
> > +               list_add_tail(&vq->vqpoll.link, 
&vq->dev->vqpoll_list);
> 
> This may work when there're at most two vqs in the list. But consider
> you may want to poll a lot of vqs in the future, it may take a long time
> for this vq to get polled. So probably we can just keep the used_flags
> untouched, if the vq get kicked, it can be served soon.

Indeed there is a patch ready for polling multiple virtqueues, and it has 
a better scheduling algorithm that avoids a virtqueue starvation. 

> > +       }
> > +       vq->vqpoll.jiffies_last_kick = jiffies;
> > +       __get_user(vq->avail_idx, &vq->avail->idx); 
> > +       vq->vqpoll.enabled = true;
> > +
> > +       /* Map userspace's vq->avail to the kernel's memory space. */
> > +       if (get_user_pages_fast((unsigned long)vq->avail, 1, 0,
> > +               &vq->vqpoll.avail_page) != 1) {
> > +               /* TODO: can this happen, as we check access
> > +               to vq->avail in advance? */
> > +               BUG();
> > +       }
> > +       vq->vqpoll.avail_mapped = (struct vring_avail *) (
> > +               (unsigned long)kmap(vq->vqpoll.avail_page) |
> > +               ((unsigned long)vq->avail & ~PAGE_MASK));
> 
> Is it a must to map avail page here?
> 

No. This is indeed in preparation for the next patch handling multiple 
queues by a single vhost thread, where we'd like to map these pages for
performance. 

> > +                               vq->vqpoll.work_this_jiffy >=
> > +                                       poll_start_rate) {
> > +                               vhost_vq_enable_vqpoll(vq);
> > +                       }
> > +               }
> > +               /* Check one virtqueue from the round-robin list */
> > +               if (!list_empty(&dev->vqpoll_list)) {
> 
> If we still have another work in work_list, we may want to serve it 
first.

You maybe right. We've done a lot of experiments with this method, 
which seems to work well. I prefer leaving it this way for now, but your 
approach 
is worthwhile to investigate as well. 

> [...]
> >  struct vhost_dev {
> > @@ -123,6 +151,7 @@ struct vhost_dev {
> >         spinlock_t work_lock;
> >         struct list_head work_list;
> >         struct task_struct *worker;
> > +        struct list_head vqpoll_list;
> >  };
> > 
> >  void vhost_dev_init(struct vhost_dev *, struct vhost_virtqueue **vqs, 
int 
> > nvqs);
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html