Re: [PATCH] vhost: Add polling mode

Jason Wang <jasowang@xxxxxxxxxx> · Thu, 24 Jul 2014 13:57:09 +0800



On 07/23/2014 04:48 PM, Abel Gordon wrote:
> On Wed, Jul 23, 2014 at 11:42 AM, Jason Wang <jasowang@xxxxxxxxxx> wrote:
>> >
>> > On 07/23/2014 04:12 PM, Razya Ladelsky wrote:
>>> > > Jason Wang <jasowang@xxxxxxxxxx> wrote on 23/07/2014 08:26:36 AM:
>>> > >
>>>> > >> From: Jason Wang <jasowang@xxxxxxxxxx>
>>>> > >> To: Razya Ladelsky/Haifa/IBM@IBMIL, kvm@xxxxxxxxxxxxxxx, "Michael S.
>>>> > >> Tsirkin" <mst@xxxxxxxxxx>,
>>>> > >> Cc: abel.gordon@xxxxxxxxx, Joel Nider/Haifa/IBM@IBMIL, Yossi
>>>> > >> Kuperman1/Haifa/IBM@IBMIL, Eran Raichstein/Haifa/IBM@IBMIL, Alex
>>>> > >> Glikson/Haifa/IBM@IBMIL
>>>> > >> Date: 23/07/2014 08:26 AM
>>>> > >> Subject: Re: [PATCH] vhost: Add polling mode
>>>> > >>
>>>> > >> On 07/21/2014 09:23 PM, Razya Ladelsky wrote:
>>>>> > >>> Hello All,
>>>>> > >>>
>>>>> > >>> When vhost is waiting for buffers from the guest driver (e.g., more
>>>>> > >>> packets
>>>>> > >>> to send in vhost-net's transmit queue), it normally goes to sleep and
>>>>> > >>> waits
>>>>> > >>> for the guest to "kick" it. This kick involves a PIO in the guest, and
>>>>> > >>> therefore an exit (and possibly userspace involvement in translating
>>> > > this
>>>>> > >>> PIO
>>>>> > >>> exit into a file descriptor event), all of which hurts performance.
>>>>> > >>>
>>>>> > >>> If the system is under-utilized (has cpu time to spare), vhost can
>>>>> > >>> continuously poll the virtqueues for new buffers, and avoid asking
>>>>> > >>> the guest to kick us.
>>>>> > >>> This patch adds an optional polling mode to vhost, that can be enabled
>>>>> > >>> via a kernel module parameter, "poll_start_rate".
>>>>> > >>>
>>>>> > >>> When polling is active for a virtqueue, the guest is asked to
>>>>> > >>> disable notification (kicks), and the worker thread continuously
>>> > > checks
>>>>> > >>> for
>>>>> > >>> new buffers. When it does discover new buffers, it simulates a "kick"
>>> > > by
>>>>> > >>> invoking the underlying backend driver (such as vhost-net), which
>>> > > thinks
>>>>> > >>> it
>>>>> > >>> got a real kick from the guest, and acts accordingly. If the
>>> > > underlying
>>>>> > >>> driver asks not to be kicked, we disable polling on this virtqueue.
>>>>> > >>>
>>>>> > >>> We start polling on a virtqueue when we notice it has
>>>>> > >>> work to do. Polling on this virtqueue is later disabled after 3
>>> > > seconds of
>>>>> > >>> polling turning up no new work, as in this case we are better off
>>>>> > >>> returning
>>>>> > >>> to the exit-based notification mechanism. The default timeout of 3
>>> > > seconds
>>>>> > >>> can be changed with the "poll_stop_idle" kernel module parameter.
>>>>> > >>>
>>>>> > >>> This polling approach makes lot of sense for new HW with
>>> > > posted-interrupts
>>>>> > >>> for which we have exitless host-to-guest notifications. But even with
>>>>> > >>> support
>>>>> > >>> for posted interrupts, guest-to-host communication still causes exits.
>>>>> > >>> Polling adds the missing part.
>>>>> > >>>
>>>>> > >>> When systems are overloaded, there won?t be enough cpu time for the
>>>>> > >>> various
>>>>> > >>> vhost threads to poll their guests' devices. For these scenarios, we
>>> > > plan
>>>>> > >>> to add support for vhost threads that can be shared by multiple
>>> > > devices,
>>>>> > >>> even of multiple vms.
>>>>> > >>> Our ultimate goal is to implement the I/O acceleration features
>>> > > described
>>>>> > >>> in:
>>>>> > >>> KVM Forum 2013: Efficient and Scalable Virtio (by Abel Gordon)
>>>>> > >>> https://www.youtube.com/watch?v=9EyweibHfEs
>>>>> > >>> and
>>>>> > >>> https://www.mail-archive.com/kvm@xxxxxxxxxxxxxxx/msg98179.html
>>>>> > >>>
>>>>> > >>>
>>>>> > >>> Comments are welcome,
>>>>> > >>> Thank you,
>>>>> > >>> Razya
>>>> > >> Thanks for the work. Do you have perf numbers for this?
>>>> > >>
>>> > > Hi Jason,
>>> > > Thanks for reviewing. I ran some experiments with TCP stream netperf and
>>> > > filebench (having 2 threads performing random reads) benchmarks on an IBM
>>> > > System x3650 M4.
>>> > > All runs loaded the guests in a way that they were (cpu) saturated.
>>> > > The system had two cores per guest, as to allow for both the vcpu and the
>>> > > vhost thread to
>>> > > run concurrently for maximum throughput (but I didn't pin the threads to
>>> > > specific cores)
>>> > > I get:
>>> > >
>>> > > Netperf, 1 vm:
>>> > > The polling patch improved throughput by ~33%. Number of exits/sec
>>> > > decreased 6x.
>>> > > The same improvement was shown when I tested with 3 vms running netperf.
>>> > >
>>> > > filebench, 1 vm:
>>> > > ops/sec improved by 13% with the polling patch. Number of exits was
>>> > > reduced by 31%.
>>> > > The same experiment with 3 vms running filebench showed similar numbers.
>> >
>> > Looks good, may worth to add the result in the commit log.
>>> > >
>>>> > >> And looks like the patch only poll for virtqueue. In the future, may
>>>> > >> worth to add callbacks for vhost_net to poll socket. Then it could be
>>>> > >> used with rx busy polling in host which may speedup the rx also.
>>> > > Did you mean polling the network device to avoid interrupts?
>> >
>> > Yes, recent linux host support rx busy polling which can reduce the
>> > interrupts. If vhost can utilize this, it can also reduce the latency
>> > caused by vhost thread wakeups.
>> >
>> > And I'm also working on virtio-net busy polling in guest, if vhost can
>> > poll socket, it can also help in guest rx polling.
> Nice :)  Note that you may want to check if if the processor support
> posted interrupts. I guess that if CPU supports posted interrupts then
> benefits of polling in the front-end (from performance perspective)
> may not worth the cpu cycles wasted in the guest.
>

Yes it's worth to check. But I think busy polling in guest may still
help since it may reduce the overhead of irq and NAPI in guest, also can
reduce the latency by eliminating wakeups of both vcpu thread in host
and userspace process in guest.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html