On Tue, Feb 12, 2019 at 09:46 PM +0100, John Ferlan <jferlan@xxxxxxxxxx> wrote: > On 2/7/19 11:08 AM, Marc Hartmayer wrote: >> Commit "nodedev: Move device enumumeration out of nodeStateInitialize" >> (9f0ae0b18e3e620) has moved the heavy task of device enumeration into >> a separate thread. The problem with this commit is that there is a >> functionality change in the cleanup when udevEnumerateDevices >> fails. Before commit 9f0ae0b18e3e620, the entire driver was cleaned up >> by calling nodeStateCleanup (defined in node_device_udev.c) which >> resulted in libvirtd stopping with the error message >> 'daemonRunStateInit:800 : Driver state initialization failed'. With >> the commit 9f0ae0b18e3e620 there is only a signal to the udev thread >> that it must stop. This means that for example the watch handle isn't >> removed from the main loop and this can result in the main loop >> consuming 100% CPU time as soon as a new udev event occurs. >> >> This patch proposes a simple solution to the described problem. In >> case the udev thread stops the watch handle is removed from the main >> loop. >> >> Fixes: 9f0ae0b18e3e620 ("nodedev: Move device enumumeration out of nodeStateInitialize") >> Signed-off-by: Marc Hartmayer <mhartmay@xxxxxxxxxxxxx> >> --- >> >> Note: I'm not sure whether we should stop libvirtd (as it would have >> been done before) or if this patch is already sufficient. >> >> --- >> src/node_device/node_device_udev.c | 7 +++++++ >> 1 file changed, 7 insertions(+) >> > > What you have seems reasonable - although I wonder if it would be better > to remove the event handle in the error of nodeStateInitializeEnumerate. > > I think the assumption made was that by setting threadQuit that the next > event have some sort of failure through udevEventMonitorSanityCheck > which removes the EventHandle too. Although I cannot be sure it's been > too long to remember for sure ;-) > > Furthermore, should nodeStateCleanup be altered to check for priv->watch > == -1 and thus not worry about the code to set threadQuit, signal, and > joining the thread. A simple check for `priv->threadQuit` is probably even more useful since it’s not safe that the watch callback is set. I’ll send a v2 with the changes you suggested. > > John > > BTW: I'm subscribed to the list, no need to CC... > >> diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c >> index b1e5f00067e8..299f55260129 100644 >> --- a/src/node_device/node_device_udev.c >> +++ b/src/node_device/node_device_udev.c >> @@ -1628,6 +1628,13 @@ udevEventHandleThread(void *opaque ATTRIBUTE_UNUSED) >> } >> >> if (priv->threadQuit) { >> + if (priv->watch != -1) { >> + /* Since the udev thread getting stopped remove the >> + * watch handle from the main loop */ >> + virEventRemoveHandle(priv->watch); >> + priv->watch = -1; >> + } >> + >> virObjectUnlock(priv); >> return; >> } >> > -- Kind regards / Beste Grüße Marc Hartmayer IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Matthias Hartmann Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294