On 09/28/2017 06:00 AM, Erik Skultety wrote: > [...] > >>> >>> nodeDeviceLock(); >>> + priv = driver->privateData; >>> udev_monitor = DRV_STATE_UDEV_MONITOR(driver); >>> >>> if (!udevEventCheckMonitorFD(udev_monitor, privateData->monitor_fd)) { >>> @@ -1725,6 +1727,9 @@ udevEventHandleThread(void *opaque) >>> device = udev_monitor_receive_device(udev_monitor); >>> nodeDeviceUnlock(); >>> >>> + /* Re-enable polling for new events on the @udev_monitor */ >>> + virEventUpdateHandle(priv->watch, VIR_EVENT_HANDLE_READABLE); >>> + >> >> I think this should only be done when privateData->nevents == 0? If we >> have multiple events to read, then calling virEventPollUpdateHandle, >> (eventually) for every pass through the loop seems like a bit of >> overkill especially if udevEventHandleCallback turns right around and >> disables it again. >> >> Also fortunately there isn't more than one udev thread sending the >> events since you access the priv->watch without the driver lock... >> >> Conversely perhaps we only disable if events > 1... Then again, how does >> one get to 2 events queued if we're disabling as soon as we increment > > Very good point, technically events would still get queued, we just wouldn't > check and yes, we would process 1 event at a time. Not optimal, but if you look > at the original code and compare it with this one performance-wise (and I hope > I haven't missed anything that would render everything I write next a complete > rubbish), the time complexity hasn't changed, the space complexity hasn't > changed, what changed is code complexity which makes the code a bit slower due > to the excessive locking and toggling the FD polling back and forth. So > essentially you've got the same thing as you had before..but asynchronous. > However, yes, the usage of @nevents is completely useless now (haven't realized > that immediately, thanks) and a simple signalling should suffice. Having it "slower" is necessarily bad ;-) That gives some of the other slower buggers a chance to fill in the details we need. Throwing the control back to udev quicker could aid in that too. > > So how could we make it faster though? I thought more about the idea you shared > in one of the previous reviews, letting the thread actually pull all the data > from the monitor, to which I IIRC replied something in the sense that the event > counting mechanism wouldn't allow that and it would break. Okay, let's drop the > event counting. What if we now let both the udev handler thread and the event > loop "poll" the file descriptor, IOW let the event loop polling the monitor fd, > thus invoking udevHandleCallback which would in turn keep signalling the handler > thread that there are some data. The difference now in the handler thread would > be that it wouldn't blindly trust the callback about the data, because of the > scheduling issue, it would keep poking the monitor itself until it gets either > EAGAIN or EWOULDBLOCK from recvmsg() called in libudev, which would then be the > signal to start waiting on the condition. The next an event appears, a signal > from udevHandleCallback would finally have a meaning and wouldn't be ignored. Is making it faster really a goal? It's preferable that it works consistently I think. The various errno possibilities and the desire to avoid the "udev_monitor_receive_device returned NULL" message processing because we had too many "cooks in the kitchen" trying to determine whether a new device was really available or was this just another notification for something we're already processing. Also, not that I expect the udev code to change, but if a new errno is added then we may have to keep up... Always a fear especially if we're using the errno to dictate our algorithm. > > This way, it's actually the handler thread who's 'the boss', as most of the > signals from udevHandleCallback would be lost/NOPs. > > Would you agree to such an approach? At some point we get too fancy and think too hard about a problem letting it consume us. I think when I presented my idea - I wasn't really understanding the details of how we consume the udev data. Now that I have a bit more of a clue - perhaps the original idea wasn't so good. If we receive multiple notifications that a device is ready to be processed even though we could be processing it - how are we to truly know whether we missed one or we really got it and udev was just reminding us again. I'm not against looking at a different mechanism - the question then becomes from your perspective is it worth it? John > > Erik > >> nevents? Wouldn't this totally obviate the need for nevents? >> >> I would think it would be overkill to disable/enable for just 1 event. >> If multiple are queued, then yeah we're starting to get behind... Of >> course that could effect the re-enable when events == 1 (or some >> MAGIC_CONSTANT_NUMBER_THRESHOLD_TO_DISABLE_POLLING) > > This magic threshold approach still wouldn't work because you don't know > anything about the event at that point, you'd have to process them to actually > find out whether that's a legitimate event or it's an already noted event that > hasn't been processed yet, so it would still mess with your counters. > >> >> IIRC from the previous trip down this rabbit hole (I did briefly go back >> and read the comments) that what you're trying to avoid is the following >> message spamming the error logs because of a scheduling mishap... Thus >> perhaps the following message could only be displayed if nevents > 1 and >> nothing was found knowing that we have this "timing problem" between >> udev, this thread, and the fd polling scanner of when a device should be >> "found"... and then reset nevents == 0 with the appropriate lock. >> >> Curious on your thoughts before an R-B/ACK though. >> >> John >> >> >>> if (!device) { >>> virReportError(VIR_ERR_INTERNAL_ERROR, "%s", >>> _("udev_monitor_receive_device returned NULL")); >>> @@ -1750,10 +1755,12 @@ udevEventHandleCallback(int watch ATTRIBUTE_UNUSED, >>> int events ATTRIBUTE_UNUSED, >>> void *opaque) >>> { >>> + udevPrivate *priv = NULL; >>> struct udev_monitor *udev_monitor = NULL; >>> udevEventThreadDataPtr threadData = opaque; >>> >>> nodeDeviceLock(); >>> + priv = driver->privateData; >>> udev_monitor = DRV_STATE_UDEV_MONITOR(driver); >>> >>> if (!udevEventCheckMonitorFD(udev_monitor, fd)) { >>> @@ -1771,6 +1778,16 @@ udevEventHandleCallback(int watch ATTRIBUTE_UNUSED, >>> threadData->nevents++; >>> virCondSignal(&threadData->threadCond); >>> virMutexUnlock(&threadData->lock); >>> + >>> + /* Due to scheduling, the eventloop might poll the udev monitor before the >>> + * handler thread actually removes the data from the socket, thus causing it >>> + * to spam errors about data not being ready yet, because >>> + * udevEventHandleCallback would be invoked multiple times incrementing the >>> + * counter of events queuing for a single event. >>> + * Therefore we need to disable polling on the fd until the thread actually >>> + * removes the data from the socket. >>> + */ >>> + virEventUpdateHandle(priv->watch, 0); >>> } >>> >>> >>> -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list