On Monday, April 20, 2020 6:11:25 A.M. EDT Michal Privoznik wrote: > On 4/16/20 5:57 PM, Mark Asselstine wrote: > > The udev monitor thread "udevEventHandleThread()" will lag the > > actual/real view of devices in sysfs as it serially processes udev > > monitor events. So for instance if you were to run the following cmd > > to create a new veth pair and rename one of the veth endpoints > > > > you might see the following monitor events and real world that looks like > > > > time > > | > > | create v0 sysfs entry > > > > wake udevEventHandleThread | create v1 sysfs entry > > udev_monitor_receive_device(v1-add) | move v0 sysfs to v2 > > udevHandleOneDevice(v1) | > > udev_monitor_receive_device(v0-add) | > > udevHandleOneDevice(v0) | <--- error msgs in > > virNetDevGetLinkInfo() udev_monitor_receive_device(v2-move) | as v0 > > no longer exists udevHandleOneDevice(v2) | > > > > \/ > > > > As you can see the changes in sysfs can take place well before we get > > to act on the events in the udevEventHandleThread(), so by the time we > > get around to processing the v0 add event, the sysfs entry has been > > moved to v2. > > > > To work around this we check if the sysfs entry is valid before > > attempting to read it and don't bother trying to read link info if > > not. This is safe since we will never read sysfs entries earlier than > > it existing, ie. if the entry is not there it has either been removed > > in the time since we enumerated the device or something bigger is > > busted, in either case, no sysfs entry, no link info. In the case > > described above we will eventually get the link info as we work > > through the queue of monitor events and get to the 'move' event. > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1557902 > > > > Signed-off-by: Mark Asselstine <mark.asselstine@xxxxxxxxxxxxx> > > --- > > > > src/util/virnetdev.c | 11 +++++++++++ > > 1 file changed, 11 insertions(+) > > > > diff --git a/src/util/virnetdev.c b/src/util/virnetdev.c > > index b465bdac2e..bf256cbe35 100644 > > --- a/src/util/virnetdev.c > > +++ b/src/util/virnetdev.c > > @@ -2438,6 +2438,17 @@ virNetDevGetLinkInfo(const char *ifname, > > > > if (virNetDevSysfsFile(&path, ifname, "operstate") < 0) > > > > goto cleanup; > > > > + /* The device may have been removed or moved by the time we got here. > > + * Obviously attempting to get LinkInfo on a no longer existing > > device > > + * is useless, so stop processing. If we got here via the udev > > monitor > > + * a remove or move event will follow and we will be able to get > > valid > > + * LinkInfo at that time */ > > + if (!virFileExists(path)) { > > + VIR_WARN("The interface '%s' was removed before we could query > > it.", > > + ifname); > > This is not aligned. Should I send a V2 or can this be correct when merged? > > > + goto cleanup; > > + } > > But I have another idea to consider. We could return a different error > value here (say -2) and then have the caller decide what to do with it > (e.g. print the warning). I had thought about propagating the error so the caller could possibly wrap the warning message in more context but in the end I decided to call it out at the issue site avoiding having each caller essentially reproducing essentially the same code. > > But this is still inherently racy. The "move" can happen just after > we've checked for the @path existence and before this virFileReadAll() > below. > > But I guess this is still better than nothing. Agreed, it isn't 100% fault proof. I didn't have the time to refactor the function to make is safer, but I will definitely think about coming back and doing so in the future. Thanks for your review. Mark > > Reviewed-by: Michal Privoznik <mprivozn@xxxxxxxxxx> > > Michal