On Monday, April 20, 2020 9:30:56 A.M. EDT Michal Privoznik wrote: > On 4/20/20 3:09 PM, Mark Asselstine wrote: > > On Monday, April 20, 2020 6:11:25 A.M. EDT Michal Privoznik wrote: > >> On 4/16/20 5:57 PM, Mark Asselstine wrote: > >>> The udev monitor thread "udevEventHandleThread()" will lag the > >>> actual/real view of devices in sysfs as it serially processes udev > >>> monitor events. So for instance if you were to run the following cmd > >>> to create a new veth pair and rename one of the veth endpoints > >>> > >>> you might see the following monitor events and real world that looks > >>> like > >>> > >>> time > >>> | > >>> | create v0 sysfs entry > >>> > >>> wake udevEventHandleThread | create v1 sysfs entry > >>> udev_monitor_receive_device(v1-add) | move v0 sysfs to v2 > >>> udevHandleOneDevice(v1) | > >>> udev_monitor_receive_device(v0-add) | > >>> udevHandleOneDevice(v0) | <--- error msgs in > >>> virNetDevGetLinkInfo() udev_monitor_receive_device(v2-move) | as > >>> v0 > >>> no longer exists udevHandleOneDevice(v2) | > >>> > >>> \/ > >>> > >>> As you can see the changes in sysfs can take place well before we get > >>> to act on the events in the udevEventHandleThread(), so by the time we > >>> get around to processing the v0 add event, the sysfs entry has been > >>> moved to v2. > >>> > >>> To work around this we check if the sysfs entry is valid before > >>> attempting to read it and don't bother trying to read link info if > >>> not. This is safe since we will never read sysfs entries earlier than > >>> it existing, ie. if the entry is not there it has either been removed > >>> in the time since we enumerated the device or something bigger is > >>> busted, in either case, no sysfs entry, no link info. In the case > >>> described above we will eventually get the link info as we work > >>> through the queue of monitor events and get to the 'move' event. > >>> > >>> https://bugzilla.redhat.com/show_bug.cgi?id=1557902 > >>> > >>> Signed-off-by: Mark Asselstine <mark.asselstine@xxxxxxxxxxxxx> > >>> --- > >>> > >>> src/util/virnetdev.c | 11 +++++++++++ > >>> 1 file changed, 11 insertions(+) > >>> > >>> diff --git a/src/util/virnetdev.c b/src/util/virnetdev.c > >>> index b465bdac2e..bf256cbe35 100644 > >>> --- a/src/util/virnetdev.c > >>> +++ b/src/util/virnetdev.c > >>> @@ -2438,6 +2438,17 @@ virNetDevGetLinkInfo(const char *ifname, > >>> > >>> if (virNetDevSysfsFile(&path, ifname, "operstate") < 0) > >>> > >>> goto cleanup; > >>> > >>> + /* The device may have been removed or moved by the time we got > >>> here. > >>> + * Obviously attempting to get LinkInfo on a no longer existing > >>> device > >>> + * is useless, so stop processing. If we got here via the udev > >>> monitor > >>> + * a remove or move event will follow and we will be able to get > >>> valid > >>> + * LinkInfo at that time */ > >>> + if (!virFileExists(path)) { > >>> + VIR_WARN("The interface '%s' was removed before we could query > >>> it.", > >>> + ifname); > >> > >> This is not aligned. > > > > Should I send a V2 or can this be correct when merged? > > No need, I've fixed it (also changed goto cleanup to return, since I've > merged a patch that made 'cleanup' label go away this morning) and pushed. > > >>> + goto cleanup; > >>> + } > >> > >> But I have another idea to consider. We could return a different error > >> value here (say -2) and then have the caller decide what to do with it > >> (e.g. print the warning). > > > > I had thought about propagating the error so the caller could possibly > > wrap > > the warning message in more context but in the end I decided to call it > > out at the issue site avoiding having each caller essentially reproducing > > essentially the same code. > > Yeah, what convinced me was that non-Linux version of this function > simply prints a debug message and returns 0. I haven't realized that > earlier, sorry. > > Michal No problem. Thanks for taking the patches and making the necessary tweaks. Mark