On 08/24/2017 07:23 AM, Erik Skultety wrote: > Adjust udevEventHandleThread to be a proper thread routine running in an > infinite loop handling devices. Also introduce udevEventThreadData > private structure. > Every time there's and incoming event from udev, udevEventHandleCallback > only increments the number of events queuing on the monitor and signals > the worker thread to handle them. > > Signed-off-by: Erik Skultety <eskultet@xxxxxxxxxx> > --- > src/node_device/node_device_udev.c | 132 ++++++++++++++++++++++++++++++------- > 1 file changed, 108 insertions(+), 24 deletions(-) > Once we get here I'm not sure I understand the udev interaction. I guess it's not "crystal clear" that the socket relationship is a queue of device events. I also haven't studied the udev code nor have I been working through this as much as you have lately - so perhaps something you've uncovered will help educate me in this manner. Still here's my thoughts and a small sliver of investigative data... So far there's been a 1-to-1 interaction between libvirt and udev event. With this change there could be an n-to-1 interaction - as in receive @n devices. Up to this point the interaction has been: 1. udev event 2. validate fd/udev_monitor 3. call udev_monitor_receive_device to get @device 4. process @device 5. unref @device 6. return to udev After this point 1. udev event 2. validate fd/udev_monitor 3. increase event count 4. signal condition to thread to wakeup 5. return to udev asynchronously in a loop 1. wakeup on condition and for each 'event' refcnt 2. decrease event refcnt 3. validate fd/udev_monitor 4. call udev_monitor_receive_device to get @device 5. process @device 6. unref @device If there's more than one event, does udev_monitor_receive_device guarantee the order? Are we sure udev buffers/queues in this manner? That is since we're not grabbing @device before we return to udev when the event is signaled, is there any guarantee from udev that *the same* device will still exist when we do get around to making our call? My concern being how "long" is the data guaranteed to be there. As I read the subsequent patch - I'm thinking perhaps we really want to keep that 1-to-1 relationship. Could the reason that you're getting those messages be because we're now calling udev out of the order it expected? Do we know why we're getting a NULL return? I found: https://sourcecodebrowser.com/udev/141/libudev-monitor_8c.html which seems to indicate a number of reasons to return NULL... Maybe udev is processing another device and while doing so it blocks anyone calling udev_monitor_receive_device. Conversely while processing an event that same block wouldn't be present because the expectation is that the called function will call udev_monitor_receive_device. I think perhaps the processing thread is fine to have; however, instead of processing the event it's processing an @device that was managed in udevEventHandleCallback and put onto some "worker queue". At least that way we know we have a refcnt'd udev_device and we can process in an expected order. John > diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c > index fe943c78b..444e5be4d 100644 > --- a/src/node_device/node_device_udev.c > +++ b/src/node_device/node_device_udev.c > @@ -59,6 +59,54 @@ struct _udevPrivate { > bool privileged; > }; > > +typedef struct _udevEventThreadData udevEventThreadData; > +typedef udevEventThreadData *udevEventThreadDataPtr; > + > +struct _udevEventThreadData { > + virMutex lock; > + virCond threadCond; > + > + bool threadQuit; > + int monitor_fd; > + unsigned long long nevents; /* number of udev events queuing on monitor */ > +}; > + > + > +static udevEventThreadDataPtr > +udevEventThreadDataNew(int fd) > +{ > + udevEventThreadDataPtr ret = NULL; > + > + if (VIR_ALLOC_QUIET(ret) < 0) > + return NULL; > + > + if (virMutexInit(&ret->lock) < 0) > + goto cleanup; > + > + if (virCondInit(&ret->threadCond) < 0) > + goto cleanup_mutex; > + > + ret->monitor_fd = fd; > + > + return ret; > + > + cleanup_mutex: > + virMutexDestroy(&ret->lock); > + > + cleanup: > + VIR_FREE(ret); > + return NULL; > +} > + > + > +static void > +udevEventThreadDataFree(udevEventThreadDataPtr data) > +{ > + virMutexDestroy(&data->lock); > + virCondDestroy(&data->threadCond); > + VIR_FREE(data); > +} > + > > static bool > udevHasDeviceProperty(struct udev_device *dev, > @@ -1647,35 +1695,53 @@ udevCheckMonitorFD(struct udev_monitor *udev_monitor, int fd) > > > static void > -udevEventHandleThread(void *opaque ATTRIBUTE_UNUSED) > +udevEventHandleThread(void *opaque) > { > + udevEventThreadDataPtr privateData = opaque; > struct udev_device *device = NULL; > struct udev_monitor *udev_monitor = NULL; > - int fd = (intptr_t) opaque; > > - nodeDeviceLock(); > - udev_monitor = DRV_STATE_UDEV_MONITOR(driver); > + /* continue rather than break from the loop on non-fatal errors */ > + while (1) { > + virMutexLock(&privateData->lock); > + while (privateData->nevents == 0 && !privateData->threadQuit) { > + if (virCondWait(&privateData->threadCond, &privateData->lock)) { > + virReportSystemError(errno, "%s", > + _("handler failed to wait on condition")); > + goto cleanup; > + } > + } > > - if (!udevCheckMonitorFD(udev_monitor, fd)) > - goto unlock; > + privateData->nevents--; > + virMutexUnlock(&privateData->lock); > > - device = udev_monitor_receive_device(udev_monitor); > - if (device == NULL) { > - virReportError(VIR_ERR_INTERNAL_ERROR, "%s", > - _("udev_monitor_receive_device returned NULL")); > - goto unlock; > + nodeDeviceLock(); > + udev_monitor = DRV_STATE_UDEV_MONITOR(driver); > + > + if (!udevCheckMonitorFD(udev_monitor, privateData->monitor_fd)) { > + nodeDeviceUnlock(); > + goto cleanup; > + } > + > + device = udev_monitor_receive_device(udev_monitor); > + nodeDeviceUnlock(); > + > + if (!device) { > + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", > + _("udev_monitor_receive_device returned NULL")); > + goto next; > + } > + > + udevHandleOneDevice(device); > + > + next: > + udev_device_unref(device); > } > > - nodeDeviceUnlock(); > - udevHandleOneDevice(device); > - > cleanup: > udev_device_unref(device); > + udevEventThreadDataFree(privateData); > return; > - > - unlock: > - nodeDeviceUnlock(); > - goto cleanup; > } > > > @@ -1683,20 +1749,28 @@ static void > udevEventHandleCallback(int watch ATTRIBUTE_UNUSED, > int fd, > int events ATTRIBUTE_UNUSED, > - void *data ATTRIBUTE_UNUSED) > + void *opaque) > { > struct udev_monitor *udev_monitor = NULL; > + udevEventThreadDataPtr threadData = opaque; > > nodeDeviceLock(); > udev_monitor = DRV_STATE_UDEV_MONITOR(driver); > - > if (!udevCheckMonitorFD(udev_monitor, fd)) { > + virMutexLock(&threadData->lock); > + threadData->threadQuit = true; > + virCondSignal(&threadData->threadCond); > + virMutexUnlock(&threadData->lock); > + > nodeDeviceUnlock(); > return; > } > nodeDeviceUnlock(); > > - udevEventHandleThread((void *)(intptr_t) fd); > + virMutexLock(&threadData->lock); > + threadData->nevents++; > + virCondSignal(&threadData->threadCond); > + virMutexUnlock(&threadData->lock); > } > > > @@ -1823,6 +1897,9 @@ nodeStateInitialize(bool privileged, > { > udevPrivate *priv = NULL; > struct udev *udev = NULL; > + int monitor_fd = -1; > + virThread th; > + udevEventThreadDataPtr threadData = NULL; > > if (VIR_ALLOC(priv) < 0) > return -1; > @@ -1883,6 +1960,14 @@ nodeStateInitialize(bool privileged, > 128 * 1024 * 1024); > #endif > > + monitor_fd = udev_monitor_get_fd(priv->udev_monitor); > + if (!(threadData = udevEventThreadDataNew(monitor_fd)) || > + virThreadCreate(&th, false, udevEventHandleThread, threadData) < 0) { > + virReportSystemError(errno, "%s", > + _("failed to create udev handling thread")); > + goto cleanup; > + } > + > /* We register the monitor with the event callback so we are > * notified by udev of device changes before we enumerate existing > * devices because libvirt will simply recreate the device if we > @@ -1891,9 +1976,8 @@ nodeStateInitialize(bool privileged, > * enumeration. The alternative is to register the callback after > * we enumerate, in which case we will fail to create any devices > * that appear while the enumeration is taking place. */ > - priv->watch = virEventAddHandle(udev_monitor_get_fd(priv->udev_monitor), > - VIR_EVENT_HANDLE_READABLE, > - udevEventHandleCallback, NULL, NULL); > + priv->watch = virEventAddHandle(monitor_fd, VIR_EVENT_HANDLE_READABLE, > + udevEventHandleCallback, threadData, NULL); > if (priv->watch == -1) > goto unlock; > > -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list