On 2017-11-27 02:38 PM, David Hill wrote:
On 2017-11-26 10:44 PM, Jason Wang wrote:
On 2017年11月25日 00:22, David Hill wrote:
The VMs all have 2 vNICs ... and this is the hypervisor:
[root@zappa ~]# brctl show
bridge name bridge id STP enabled interfaces
virbr0 8000.525400914858 yes virbr0-nic
vnet0
vnet1
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state
UP group default qlen 1000
link/ether 84:2b:2b:13:f2:91 brd ff:ff:ff:ff:ff:ff
inet redacted/24 brd 173.178.138.255 scope global dynamic eno1
valid_lft 48749sec preferred_lft 48749sec
inet6 fe80::862b:2bff:fe13:f291/64 scope link
valid_lft forever preferred_lft forever
3: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state
UP group default qlen 1000
link/ether 84:2b:2b:13:f2:92 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.3/24 brd 192.168.1.255 scope global eno2
valid_lft forever preferred_lft forever
inet6 fe80::862b:2bff:fe13:f292/64 scope link
valid_lft forever preferred_lft forever
4: virbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
state UP group default qlen 1000
link/ether 52:54:00:91:48:58 brd ff:ff:ff:ff:ff:ff
inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
valid_lft forever preferred_lft forever
inet 192.168.122.10/32 scope global virbr0
valid_lft forever preferred_lft forever
inet 192.168.122.11/32 scope global virbr0
valid_lft forever preferred_lft forever
inet 192.168.122.12/32 scope global virbr0
valid_lft forever preferred_lft forever
inet 192.168.122.15/32 scope global virbr0
valid_lft forever preferred_lft forever
inet 192.168.122.16/32 scope global virbr0
valid_lft forever preferred_lft forever
inet 192.168.122.17/32 scope global virbr0
valid_lft forever preferred_lft forever
inet 192.168.122.18/32 scope global virbr0
valid_lft forever preferred_lft forever
inet 192.168.122.31/32 scope global virbr0
valid_lft forever preferred_lft forever
inet 192.168.122.32/32 scope global virbr0
valid_lft forever preferred_lft forever
inet 192.168.122.33/32 scope global virbr0
valid_lft forever preferred_lft forever
inet 192.168.122.34/32 scope global virbr0
valid_lft forever preferred_lft forever
inet 192.168.122.35/32 scope global virbr0
valid_lft forever preferred_lft forever
inet 192.168.122.36/32 scope global virbr0
valid_lft forever preferred_lft forever
inet 192.168.122.37/32 scope global virbr0
valid_lft forever preferred_lft forever
inet 192.168.122.45/32 scope global virbr0
valid_lft forever preferred_lft forever
inet 192.168.122.46/32 scope global virbr0
valid_lft forever preferred_lft forever
inet 192.168.122.47/32 scope global virbr0
valid_lft forever preferred_lft forever
inet 192.168.122.48/32 scope global virbr0
valid_lft forever preferred_lft forever
inet 192.168.122.49/32 scope global virbr0
valid_lft forever preferred_lft forever
inet 192.168.122.50/32 scope global virbr0
valid_lft forever preferred_lft forever
inet 192.168.122.51/32 scope global virbr0
valid_lft forever preferred_lft forever
5: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc fq_codel master
virbr0 state DOWN group default qlen 1000
link/ether 52:54:00:91:48:58 brd ff:ff:ff:ff:ff:ff
125: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1360 qdisc
fq_codel state UNKNOWN group default qlen 100
link/none
inet 10.10.122.28/21 brd 10.10.127.255 scope global tun0
valid_lft forever preferred_lft forever
inet6 fe80::1f9b:bfd4:e9c9:2059/64 scope link stable-privacy
valid_lft forever preferred_lft forever
402: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
fq_codel master virbr0 state UNKNOWN group default qlen 1000
link/ether fe:54:00:09:27:39 brd ff:ff:ff:ff:ff:ff
inet6 fe80::fc54:ff:fe09:2739/64 scope link
valid_lft forever preferred_lft forever
403: vnet1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
fq_codel master virbr0 state UNKNOWN group default qlen 1000
link/ether fe:54:00:ea:6b:18 brd ff:ff:ff:ff:ff:ff
inet6 fe80::fc54:ff:feea:6b18/64 scope link
valid_lft forever preferred_lft forever
I could not reproduce this locally by simply running netperf through
a mlx4 card. Some more questions:
- What kind of workloads did you run in guest?
- Did you meet this issue in a specific type of network card (I guess
broadcom is used in this case)?
- Virbr0 looks like a bridge created by libvirt that did NAT and
other stuffs, can you still hit this issue if you don't use virbr0?
And what's more important, zerocopy is known to have issues, for
production environment, need to disable it through vhost_net module
parameters.
Thanks
I'm deploying an overcloud through a undercloud virtual machine... The
VM has 4vCPUs and 16GB of RAM as well as to virtio nics so I'm using
only virtual hardware here.
I spawn 7 VMs on the hypervisor and deploy an overcloud using tripleo
on them ... everything's virtual and if I remove the bridge, then I'll
have to configure each VMs differently.
The load is quite high on the VM that won't shutdown but when I shut
it down, it's doing nothing ... This is a hard bug to troubleshoot
and I can't bisect the kernel because at some
point the system simply won't boot properly.
I've disabled zerocopy with the following:
[root@zappa modprobe.d]# cat vhost-net.conf
options vhost_net experimental_zcopytx=0
And I haven't reproduce this issue so far. The problem I have right
now is that experimental_zcopytx has been enabled by default with this
commit:
commit f9611c43ab0ddaf547b395c90fb842f55959334c
Author: Michael S. Tsirkin <mst@xxxxxxxxxx>
Date: Thu Dec 6 14:56:00 2012 +0200
vhost-net: enable zerocopy tx by default
Zero copy TX has been around for a while now.
We seem to be down to eliminating theoretical bugs
and performance tuning at this point:
it's probably time to enable it by default so that
most users get the benefit.
Keep the flag around meanwhile so users can experiment
with disabling this if they experience regressions.
I expect that we will remove it in the future.
Signed-off-by: Michael S. Tsirkin <mst@xxxxxxxxxx>
I'll try some more pass in producing this issue and I'll keep you posted.
Thank you very much,
David Hill