Re: Fw: Benchmarking for vhost polling patch

Razya Ladelsky <RAZYA@xxxxxxxxxx> · Sun, 11 Jan 2015 14:44:17 +0200

> Hi Razya,
> Thanks for the update.
> So that's reasonable I think, and I think it makes sense
> to keep working on this in isolation - it's more
> manageable at this size.
> 
> The big questions in my mind:
> - What happens if system is lightly loaded?
>   E.g. a ping/pong benchmark. How much extra CPU are
>   we wasting?
> - We see the best performance on your system is with 10usec worth of 
polling.
>   It's OK to be able to tune it for best performance, but
>   most people don't have the time or the inclination.
>   So what would be the best value for other CPUs?

The extra cpu waste vs throughput gains depends on the polling timeout 
value(poll_stop_idle).
The best value to chose is dependant on the workload and the system 
hardware and configuration.
There is nothing that we can say about this value in advance. The system's 
manager/administrator should use this optimization with the awareness that 
polling
consumes extra cpu cycles, as documented. 

> - Should this be tunable from usespace per vhost instance?
>   Why is it only tunable globally?

It should be tunable per vhost thread.
We can do it in a subsequent patch.

> - How bad is it if you don't pin vhost and vcpu threads?
>   Is the scheduler smart enough to pull them apart?
> - What happens in overcommit scenarios? Does polling make things
>   much worse?
>   Clearly polling will work worse if e.g. vhost and vcpu
>   share the host cpu. How can we avoid conflicts?
> 
>   For two last questions, better cooperation with host scheduler will
>   likely help here.
>   See e.g.  
http://thread.gmane.org/gmane.linux.kernel/1771791/focus=1772505
>   I'm currently looking at pushing something similar upstream,
>   if it goes in vhost polling can do something similar.
> 
> Any data points to shed light on these questions?

I ran a simple apache benchmark, with an over commit scenario, where both 
the vcpu and vhost share the same core.
In some cases (c>4 in my testcases) polling surprisingly produced a better 
throughput.
Therefore, it is hard to predict how the polling will impact performance 
in advance. 
It is up to whoever is using this optimization to use it wisely.
Thanks,
Razya 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html