Re: [PATCH] Fixed missing VM vport when batch start or migration partially failed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



To complete the circle, here is my response to a *different* patch trying to fix this same problem. I did a bit more investigating during my reply, so there is better / more complete information:

   https://www.redhat.com/archives/libvir-list/2020-June/msg00681.html

On 6/15/20 11:10 PM, Wei Gong wrote:
  environment:libvirt-4.3.0 qemu-kvm-ev-2.10.0 kernel-3.10.0-1062 centos7 openvswitch-2.3.1 
 
 vm network xml :
<interface type='bridge'>
  <mac address='52:54:00:46:45:95'/>
  <source bridge='ovsbr-mgt'/>
  <vlan>
    <tag id='0'/>
  </vlan>
  <virtualport type='openvswitch'>
    <parameters interfaceid='596c6ab7-4557-4935-af97-62a35d933f8d'/>
  </virtualport>
  <target dev='vnet0'/>
  <model type='virtio'/>
  <link state='up'/>
  <alias name='net0'/>
  <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
</interface>

qemuProcessStart in qemu_process.c failed to start. 
The first is qemu process stop(At this time, the kernel will recycle tap device,
and the tap device is applied by other virtual machines).Then, ovs removevport.
It is possible to processing concurrently qemuProcessStart and qemuProcessStop.
qemuProcessStop(ovs removevport) may remove ports of other virtual machines 
while using openvswitch virtualport.

for example:
Failure to start the vm1, the tap device vnet0 will be recovered first(at this time vm2 starts and
uses vnet0 device,and ovs add vnet0 port), then the removevport vnet0( remove vnet0
belonging to vm2 at this time ). During this time interval,
vm2 will apply for the same tap device vnet0 and add port vnet0.
 At this time, removing the port from vm1 will cause the port of vm2 to be lost. 
vm2 will not be able to access the network through this vnet0.

reproduce:
Batch start or migrate 10 virtual machines to the same node, one of the virtual machines start failed.
This failure may be that the storage cannot connect or other failures(when we reproduced internally,
 one of the virtual machines was connected to an invalid storage, and it was artificially failed).

this problem will cause:
After batch migration, the network of a virtual machine cannot be accessed, 
and the virtual machine service is interrupted

libvirt handles ovs logs:
Jun 10 19:11:32 zbs-sh-elf-11 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=5 -- --if-exists del-port vnet4 -- add-port ovsbr-mgt vnet4 tag=0 -- set Interface vnet4 "external-ids:attached-mac=\"52:54:00:92:7e:7f\"" -- set Interface vnet4 "external-ids:iface-id=\"afb3a67a-5e5d-4ca6-b625-ebce6a9c8d03\"" -- set Interface vnet4 "external-ids:vm-id=\"7b9e4d5a-e8e9-4527-9b89-dd1f74d02526\"" -- set Interface vnet4 external-ids:iface-status=active
Jun 10 19:11:32 zbs-sh-elf-11 kernel: device vnet4 entered promiscuous mode
Jun 10 19:11:32 zbs-sh-elf-11 kernel: device vnet4 left promiscuous mode
Jun 10 19:11:32 zbs-sh-elf-11 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=5 -- --if-exists del-port vnet4 -- add-port ovsbr-mgt vnet4 tag=0 -- set Interface vnet4 "external-ids:attached-mac=\"52:54:00:b7:f4:07\"" -- set Interface vnet4 "external-ids:iface-id=\"c837d02d-4a4e-4f9c-9bee-7e5efce01a8e\"" -- set Interface vnet4 "external-ids:vm-id=\"83035f1e-faed-43d6-951e-08c90c9006a9\"" -- set Interface vnet4 external-ids:iface-status=active
Jun 10 19:11:32 zbs-sh-elf-11 kernel: device vnet4 entered promiscuous mode
Jun 10 19:11:32 zbs-sh-elf-11 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=5 -- --if-exists del-port vnet4


Thanks  

Laine Stump <laine@xxxxxxxxxx> 于2020年6月16日周二 上午10:01写道:
On 6/15/20 2:04 PM, Daniel Henrique Barboza wrote:
>
>
> On 6/12/20 3:18 AM, gongwei@xxxxxxxxxx wrote:
>> From: gongwei <gongwei@xxxxxxxxxx>
>>
>> start to failed will not remove the openvswitch port,
>> the port recycling in this case lets openvswitch handle it by itself
>>
>> Signed-off-by: gongwei <gongwei@xxxxxxxxxx>
>> ---
>
> Can you please elaborate on the commit message? By the commit title and
> the code, I'm assuming that you're saying that we shouldn't remove the
> openvswitch port if the QEMU process failed to start, for any other
> reason aside from SHUTOFF_FAILED.


More importantly, what "port recycling" will take effect dependent on
how the qemu process is stopped (which I would think wouldn't make any
different to OVS), and why is it necessary for libvirt to not do it.


Up until now, what I have known is that ports will not be removed from
an OVS switch unless they are explicitly removed with ovs-vsctl, and
this attachment will persist across reboots of the host system. As a
matter of fact I've had cases during development where libvirt didn't
remove the OVS port for a tap device when a guest was terminated, and
then many *days* (and several reboots) later the same tap device name
was used for a different guest that was using a Linux host bridge, and
the tap device failed to attach to the Linux host bridge because it had
already been auto-attached back to the OVS switch as soon as it was created.


Can you desccribe how to reproduce the situation where libvirt removes
the OVS port when it shouldn't, and what is the bad outcome of that
happening?



>
> The code itself looks ok.
>
>
>
>>   src/qemu/qemu_process.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
>> index d36088ba98..439bd5b396 100644
>> --- a/src/qemu/qemu_process.c
>> +++ b/src/qemu/qemu_process.c
>> @@ -7482,7 +7482,8 @@ void qemuProcessStop(virQEMUDriverPtr driver,
>>           if (vport) {
>>               if (vport->virtPortType ==
>> VIR_NETDEV_VPORT_PROFILE_MIDONET) {
>> ignore_value(virNetDevMidonetUnbindPort(vport));
>> -            } else if (vport->virtPortType ==
>> VIR_NETDEV_VPORT_PROFILE_OPENVSWITCH) {
>> +            } else if (vport->virtPortType ==
>> VIR_NETDEV_VPORT_PROFILE_OPENVSWITCH &&
>> +                       reason != VIR_DOMAIN_SHUTOFF_FAILED) {
>>                   ignore_value(virNetDevOpenvswitchRemovePort(
>> virDomainNetGetActualBridgeName(net),
>>                                    net->ifname));
>>
>



--

龚伟


手机:18883262137



[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]

  Powered by Linux