Re: [PATCH] network: Fix a race condition when shutdown & start vm at the same time

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/19/20 12:00 PM, Michal Privoznik wrote:
On 6/17/20 1:46 AM, Laine Stump wrote:


What do I think should be done? Good question. Possibly we could:


A) Call virNetDevTapReattachBridge() rather than virNetDevTapAttachBridge() in virNetDevTapCreateInBridgePort(). This would eliminate problem (2).


B) Instead of checking if the tap device MAC address matches, just call virNetDevExists() - if it exists, then skip the RemovePort() - this eliminates problems (3) and (4). (NB - this would fail if it turns out that tap device deletion isn't completed synchronously with qemu process termination!)


C) If we want to make it 100% sound, we need to make "check for interface existence + removeport" an atomic operation, and mutually exclusive with virNetDevTapCreate(). This would eliminate problem (1)


I still don't quite understand how there can be a race. I mean, from system POV, libvirt creates a TAP, plugs it into a bridge (when starting the first domain). And when shutting it down and starting the second domain in parallel a new TAP device (with different index and MAC) is created independent of the first TAP, no?


Yes.


There's no problem with the tap device creation/deletion by itself. The problem is when that tap device is attached to an OVS switch. A "port" of the same name as the tap device is created in OVS, and that port continues to exist after the tap device itself is deleted. If a new tap device of the same name is created without first deleting the port from OVS, then the new tap device is automatically connected to the existing port of the same name (I don't know the exact details of how this is done, but OVS must have it's claws somewhere in the kernel or udev to make this happen, as I've seen it (and can reproduce it) myself).


Even that, by itself, isn't a problem. But if you have a tap device attached to an OVS port, and you delete the tap, that frees up the name so that the next time someone asks to create a tap device they get that same name. But if that happens before the old port has been removed from OVS (which is a separate operation), then the new tap device is attached to the old switch. In that case, simple requests to attach the new tap to a different switch (or a Linux bridge) will fail.


We solve that problem when the new tap is being attached to an OVS switch by prepending "--if-exists del-port blah" to the ovs-vsctl command that attaches to the new switch, but that doesn't help in the case that the new tap is being attached to a Linux host bridge.


Performing items (1) and (2) in my list will probably get rid of all bad behavior in practice, but there is still technically the possibility that the thread (B) for the new tap device will create that tap device after the thread (A) for the old tap device checks that the device doesn't exist (i.e. that a new tap with the same name hasn't yet been created) but before removing the port (of the same name) from the OVS switch.


This would all be much simpler if the kernel would put a "FIN WAIT" state on all tap device names so that they couldn't be re-used for a few seconds after deletion, but it doesn't, so we have to work with what we've got.





[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]

  Powered by Linux