Le 2016-10-12 03:40, Yang Zhang a écrit :
On 2016/10/12 0:36, Greg wrote:
Hello,
I currently occur a problem with latency by using KVM with virtio.
I totally ignore if the problem come from KVM, kernel or QEMU. I send
you first my question, maybe you will give me some answer.
On a debian stretch host, I've install all KVM (kernel 4.7), libvirt
(2.2) and QEMU (2.6) tools. On this host, vhost_net module is loaded
and
used.
On it, 4 debian stretch guest were deployed with virtio-scsi and
virtio-net.
Between guest, no problem of bandwidth at all (an impressive 25Gb/s
!).
Just the latency appears a little bit too high for what KVM will be
used.
The bridge for guest has been created by libvirt.
Ping with 64 bytes
Host -> Guest : 0.102/0.134/0.161/0.023 ms
Guest -> Host : 0.095/0.122/0.152/0.016 ms
Guest -> Guest : 0.118/0.184/0.218/0.025 ms
Now, I've found strange "solution". If I generate some traffic (100MB
with iperf), the ping could be really better ! :
# iperf -s & ping -c 20 162.168.0.23 :
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
PING 192.168.0.23 (192.168.0.23) 56(84) bytes of data.
64 bytes from 192.168.0.23: icmp_seq=1 ttl=64 time=0.130 ms
64 bytes from 192.168.0.23: icmp_seq=2 ttl=64 time=0.225 ms
64 bytes from 192.168.0.23: icmp_seq=3 ttl=64 time=0.212 ms
[ 4] local 192.168.0.22 port 5001 connected with 192.168.0.23 port
53350
64 bytes from 192.168.0.23: icmp_seq=4 ttl=64 time=0.063 ms
64 bytes from 192.168.0.23: icmp_seq=5 ttl=64 time=0.065 ms
64 bytes from 192.168.0.23: icmp_seq=6 ttl=64 time=0.063 ms
64 bytes from 192.168.0.23: icmp_seq=7 ttl=64 time=0.058 ms
64 bytes from 192.168.0.23: icmp_seq=8 ttl=64 time=0.067 ms
64 bytes from 192.168.0.23: icmp_seq=9 ttl=64 time=0.056 ms
64 bytes from 192.168.0.23: icmp_seq=10 ttl=64 time=0.053 ms
64 bytes from 192.168.0.23: icmp_seq=11 ttl=64 time=0.066 ms
64 bytes from 192.168.0.23: icmp_seq=12 ttl=64 time=0.058 ms
64 bytes from 192.168.0.23: icmp_seq=13 ttl=64 time=0.071 ms
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 119 MBytes 100 Mbits/sec
64 bytes from 192.168.0.23: icmp_seq=14 ttl=64 time=0.197 ms
64 bytes from 192.168.0.23: icmp_seq=15 ttl=64 time=0.211 ms
64 bytes from 192.168.0.23: icmp_seq=16 ttl=64 time=0.219 ms
64 bytes from 192.168.0.23: icmp_seq=17 ttl=64 time=0.199 ms
64 bytes from 192.168.0.23: icmp_seq=18 ttl=64 time=0.217 ms
64 bytes from 192.168.0.23: icmp_seq=19 ttl=64 time=0.197 ms
64 bytes from 192.168.0.23: icmp_seq=20 ttl=64 time=0.215 ms
--- 192.168.0.23 ping statistics ---
20 packets transmitted, 20 received, 0% packet loss, time 19004ms
rtt min/avg/max/mdev = 0.053/0.132/0.225/0.072 ms
I was thinking the issue came from large send offload.
I disabled all sg tso ufo gso gro with ethtool without any
improvement.
(But bandwidth drop dramatically 25Gbps -> 4Gbps)
If I ping from a 3rd guest, one of guest implied on iperf I see a
small
improvement (not as much as I can see on both node concerned by
iperf).
And ping the 4th guest from the 3rd one, no impact at all, I still get
~190ms.
Do you have an idea how can I improve the latency without generate
dummy
traffic ?
I have observed the same problem long time ago. I think the main
problem is the cost of vcpu scheduling. When you using iperf to
generate extra traffic, the guest is busy on handle the network packet
and it may already in guest mode when the ping packet arriving, so
there is no kick/schedule in needed. Furthermore, since we have the
posted interrupt(i am not sure whether your CPU have it), there almost
no cost for interrupt delivery.
You can try to run a simple workload to occupy 100% cpu inside guest
instead of iperf to see whether it has the same effect. Alternatively,
adding idle = poll in guest boot cmd line to force polling also works.
Yes, you're partially right.
I've done some tests on how reduce interrupt latency. Using idle=poll or
use stress tests in guest permit to reduce half the latency (but is a
real hog on host :/).
Using "intel_idle.max_cstate=0 processor.max_cstate=0 idle=poll" on host
permits to get the same result (without cpu hog)
In conclusion, I get with theses results : ~120µs between guest (from
180µs previously).
Running iperf with these new modifications still get better results with
~60µs (a third from the initial value, half from idle=poll value).
Is there something else I can eventually tune ?
Thank you all for the good work with KVM, keep up with it ! :)
Best regards,
Greg
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html