Re: [PATCH] vhost: Add polling mode

Jason Wang <jasowang@xxxxxxxxxx> · Wed, 23 Jul 2014 16:42:20 +0800

On 07/23/2014 04:12 PM, Razya Ladelsky wrote:
> Jason Wang <jasowang@xxxxxxxxxx> wrote on 23/07/2014 08:26:36 AM:
>
>> From: Jason Wang <jasowang@xxxxxxxxxx>
>> To: Razya Ladelsky/Haifa/IBM@IBMIL, kvm@xxxxxxxxxxxxxxx, "Michael S.
>> Tsirkin" <mst@xxxxxxxxxx>, 
>> Cc: abel.gordon@xxxxxxxxx, Joel Nider/Haifa/IBM@IBMIL, Yossi 
>> Kuperman1/Haifa/IBM@IBMIL, Eran Raichstein/Haifa/IBM@IBMIL, Alex 
>> Glikson/Haifa/IBM@IBMIL
>> Date: 23/07/2014 08:26 AM
>> Subject: Re: [PATCH] vhost: Add polling mode
>>
>> On 07/21/2014 09:23 PM, Razya Ladelsky wrote:
>>> Hello All,
>>>
>>> When vhost is waiting for buffers from the guest driver (e.g., more 
>>> packets
>>> to send in vhost-net's transmit queue), it normally goes to sleep and 
>>> waits
>>> for the guest to "kick" it. This kick involves a PIO in the guest, and
>>> therefore an exit (and possibly userspace involvement in translating 
> this 
>>> PIO
>>> exit into a file descriptor event), all of which hurts performance.
>>>
>>> If the system is under-utilized (has cpu time to spare), vhost can 
>>> continuously poll the virtqueues for new buffers, and avoid asking 
>>> the guest to kick us.
>>> This patch adds an optional polling mode to vhost, that can be enabled 
>>> via a kernel module parameter, "poll_start_rate".
>>>
>>> When polling is active for a virtqueue, the guest is asked to
>>> disable notification (kicks), and the worker thread continuously 
> checks 
>>> for
>>> new buffers. When it does discover new buffers, it simulates a "kick" 
> by
>>> invoking the underlying backend driver (such as vhost-net), which 
> thinks 
>>> it
>>> got a real kick from the guest, and acts accordingly. If the 
> underlying
>>> driver asks not to be kicked, we disable polling on this virtqueue.
>>>
>>> We start polling on a virtqueue when we notice it has
>>> work to do. Polling on this virtqueue is later disabled after 3 
> seconds of
>>> polling turning up no new work, as in this case we are better off 
>>> returning
>>> to the exit-based notification mechanism. The default timeout of 3 
> seconds
>>> can be changed with the "poll_stop_idle" kernel module parameter.
>>>
>>> This polling approach makes lot of sense for new HW with 
> posted-interrupts
>>> for which we have exitless host-to-guest notifications. But even with 
>>> support 
>>> for posted interrupts, guest-to-host communication still causes exits. 
>>> Polling adds the missing part.
>>>
>>> When systems are overloaded, there won?t be enough cpu time for the 
>>> various 
>>> vhost threads to poll their guests' devices. For these scenarios, we 
> plan 
>>> to add support for vhost threads that can be shared by multiple 
> devices, 
>>> even of multiple vms. 
>>> Our ultimate goal is to implement the I/O acceleration features 
> described 
>>> in:
>>> KVM Forum 2013: Efficient and Scalable Virtio (by Abel Gordon) 
>>> https://www.youtube.com/watch?v=9EyweibHfEs
>>> and
>>> https://www.mail-archive.com/kvm@xxxxxxxxxxxxxxx/msg98179.html
>>>
>>>
>>> Comments are welcome, 
>>> Thank you,
>>> Razya
>> Thanks for the work. Do you have perf numbers for this?
>>
> Hi Jason,
> Thanks for reviewing. I ran some experiments with TCP stream netperf and 
> filebench (having 2 threads performing random reads) benchmarks on an IBM 
> System x3650 M4.
> All runs loaded the guests in a way that they were (cpu) saturated.
> The system had two cores per guest, as to allow for both the vcpu and the 
> vhost thread to
> run concurrently for maximum throughput (but I didn't pin the threads to 
> specific cores)
> I get:
>
> Netperf, 1 vm:
> The polling patch improved throughput by ~33%. Number of exits/sec 
> decreased 6x.
> The same improvement was shown when I tested with 3 vms running netperf.
>
> filebench, 1 vm:
> ops/sec improved by 13% with the polling patch. Number of exits was 
> reduced by 31%.
> The same experiment with 3 vms running filebench showed similar numbers.

Looks good, may worth to add the result in the commit log.
>
>> And looks like the patch only poll for virtqueue. In the future, may
>> worth to add callbacks for vhost_net to poll socket. Then it could be
>> used with rx busy polling in host which may speedup the rx also.
> Did you mean polling the network device to avoid interrupts?

Yes, recent linux host support rx busy polling which can reduce the
interrupts. If vhost can utilize this, it can also reduce the latency
caused by vhost thread wakeups.

And I'm also working on virtio-net busy polling in guest, if vhost can
poll socket, it can also help in guest rx polling.
>>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>>> index c90f437..678d766 100644
>>> --- a/drivers/vhost/vhost.c
>>> +++ b/drivers/vhost/vhost.c
>>> @@ -24,9 +24,17 @@
>>>  #include <linux/slab.h>
>>>  #include <linux/kthread.h>
>>>  #include <linux/cgroup.h>
>>> +#include <linux/jiffies.h>
>>>  #include <linux/module.h>
>>>
>>>  #include "vhost.h"
>>> +static int poll_start_rate = 0;
>>> +module_param(poll_start_rate, int, S_IRUGO|S_IWUSR);
>>> +MODULE_PARM_DESC(poll_start_rate, "Start continuous polling of 
> virtqueue 
>>> when rate of events is at least this number per jiffy. If 0, never 
> start 
>>> polling.");
>>> +
>>> +static int poll_stop_idle = 3*HZ; /* 3 seconds */
>>> +module_param(poll_stop_idle, int, S_IRUGO|S_IWUSR);
>>> +MODULE_PARM_DESC(poll_stop_idle, "Stop continuous polling of 
> virtqueue 
>>> after this many jiffies of no work.");
>>>
>> I'm not sure using jiffy is good enough since user need know HZ value.
>> May worth to look at sk_busy_loop() which use sched_clock() and us. 
> Ok, Will look into it, thanks.
>
>>> +/* Enable or disable virtqueue polling (vqpoll.enabled) for a 
> virtqueue.
>>> + *
>>> + * Enabling this mode it tells the guest not to notify ("kick") us 
> when 
>>> its
>>> + * has made more work available on this virtqueue; Rather, we will 
>>> continuously
>>> + * poll this virtqueue in the worker thread. If multiple virtqueues 
> are 
>>> polled,
>>> + * the worker thread polls them all, e.g., in a round-robin fashion.
>>> + * Note that vqpoll.enabled doesn't always mean that this virtqueue 
> is
>>> + * actually being polled: The backend (e.g., net.c) may temporarily 
>>> disable it
>>> + * using vhost_disable/enable_notify(), while vqpoll.enabled is 
>>> unchanged.
>>> + *
>>> + * It is assumed that these functions are called relatively rarely, 
> when 
>>> vhost
>>> + * notices that this virtqueue's usage pattern significantly changed 
> in a 
>>> way
>>> + * that makes polling more efficient than notification, or vice 
> versa.
>>> + * Also, we assume that vhost_vq_disable_vqpoll() is always called on 
> vq
>>> + * cleanup, so any allocations done by vhost_vq_enable_vqpoll() can 
> be
>>> + * reclaimed.
>>> + */
>>> +static void vhost_vq_enable_vqpoll(struct vhost_virtqueue *vq)
>>> +{
>>> +       if (vq->vqpoll.enabled)
>>> +               return; /* already enabled, nothing to do */
>>> +       if (!vq->handle_kick)
>>> +               return; /* polling will be a waste of time if no 
> callback! 
>>> */
>>> +       if (!(vq->used_flags & VRING_USED_F_NO_NOTIFY)) {
>>> +               /* vq has guest notifications enabled. Disable them,
>>> +                  and instead add vq to the polling list */
>>>
>>> +               list_add_tail(&vq->vqpoll.link, 
> &vq->dev->vqpoll_list);
>> This may work when there're at most two vqs in the list. But consider
>> you may want to poll a lot of vqs in the future, it may take a long time
>> for this vq to get polled. So probably we can just keep the used_flags
>> untouched, if the vq get kicked, it can be served soon.
> Indeed there is a patch ready for polling multiple virtqueues, and it has 
> a better scheduling algorithm that avoids a virtqueue starvation. 

Ok.
>
>
>>> +       }
>>> +       vq->vqpoll.jiffies_last_kick = jiffies;
>>> +       __get_user(vq->avail_idx, &vq->avail->idx); 
>>> +       vq->vqpoll.enabled = true;
>>> +
>>> +       /* Map userspace's vq->avail to the kernel's memory space. */
>>> +       if (get_user_pages_fast((unsigned long)vq->avail, 1, 0,
>>> +               &vq->vqpoll.avail_page) != 1) {
>>> +               /* TODO: can this happen, as we check access
>>> +               to vq->avail in advance? */
>>> +               BUG();
>>> +       }
>>> +       vq->vqpoll.avail_mapped = (struct vring_avail *) (
>>> +               (unsigned long)kmap(vq->vqpoll.avail_page) |
>>> +               ((unsigned long)vq->avail & ~PAGE_MASK));
>> Is it a must to map avail page here?
>>
> No. This is indeed in preparation for the next patch handling multiple 
> queues by a single vhost thread, where we'd like to map these pages for
> performance. 

Make sense. Then for the gup failure, we may just safely disable the
polling in this case.
>
>
>
>>> +                               vq->vqpoll.work_this_jiffy >=
>>> +                                       poll_start_rate) {
>>> +                               vhost_vq_enable_vqpoll(vq);
>>> +                       }
>>> +               }
>>> +               /* Check one virtqueue from the round-robin list */
>>> +               if (!list_empty(&dev->vqpoll_list)) {
>> If we still have another work in work_list, we may want to serve it 
> first.
>
> You maybe right. We've done a lot of experiments with this method, 
> which seems to work well. I prefer leaving it this way for now, but your 
> approach 
> is worthwhile to investigate as well. 

Ok.
>
>> [...]
>>>  struct vhost_dev {
>>> @@ -123,6 +151,7 @@ struct vhost_dev {
>>>         spinlock_t work_lock;
>>>         struct list_head work_list;
>>>         struct task_struct *worker;
>>> +        struct list_head vqpoll_list;
>>>  };
>>>
>>>  void vhost_dev_init(struct vhost_dev *, struct vhost_virtqueue **vqs, 
> int 
>>> nvqs);
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html