On Thu, Sep 16, 2021 at 05:54:14PM +0200, Martin Wilck wrote: > On Thu, 2021-09-16 at 10:06 -0500, Benjamin Marzinski wrote: > > On Thu, Sep 16, 2021 at 10:54:19AM +0200, Martin Wilck wrote: > > > On Wed, 2021-09-15 at 23:14 -0500, Benjamin Marzinski wrote: > > > > On Fri, Sep 10, 2021 at 01:41:16PM +0200, mwilck@xxxxxxxx wrote: > > > > > From: Martin Wilck <mwilck@xxxxxxxx> > > > > > > > > > > The previous patches added the state machine and the timeout > > > > > handling, > > > > > but there was no wakeup mechanism for the uxlsnr for cases > > > > > where > > > > > client connections were waiting for the vecs lock. > > > > > > > > > > This patch uses the previously introduced wakeup mechanism of > > > > > struct mutex_lock for this purpose. Processes which unlock the > > > > > "global" vecs lock send an event in an eventfd which the uxlsnr > > > > > loop is polling for. > > > > > > > > > > As we are now woken up for servicing client handlers that don't > > > > > wait for input but for the lock, we need to set up the pollfds > > > > > differently, and iterate over all clients when handling events, > > > > > not only over the ones that are receiving. The hangup handling > > > > > is changed, too. We have to look at every client, even if one > > > > > has > > > > > hung up. Note that I don't take client_lock for the loop in > > > > > uxsock_listen(), it's not necessary and will be removed > > > > > elsewhere > > > > > in a follow-up patch. > > > > > > > > > > With this in place, the lock need not be taken in > > > > > execute_handler() > > > > > any more. The uxlsnr only ever calls trylock() on the vecs > > > > > lock, > > > > > avoiding any waiting for other threads to finish. > > > > > > > > > > Signed-off-by: Martin Wilck <mwilck@xxxxxxxx> > > > > > --- > > > > > multipathd/uxlsnr.c | 211 ++++++++++++++++++++++++++++++------ > > > > > ------ > > > > > -- > > > > > 1 file changed, 143 insertions(+), 68 deletions(-) > > > > > > > > > > > > > > > > I do worry that if there are, for instance, a lot of uevents > > > > coming in, > > > > this could starve the uxlsnr thread, since other threads could be > > > > grabbing and releasing the vecs lock, but if it's usually being > > > > held, > > > > then the uxlsnr thread might never try to grab it when it's free, > > > > and > > > > it > > > > will keep losing its place in line. Also, every time that the > > > > vecs lock > > > > is dropped between ppoll() calls, a wakeup will get triggered, > > > > even if > > > > the lock was grabbed by something else before the ppoll thread > > > > runs. > > > > > > I've thought about this too. It's true that the ppoll -> > > > pthread_mutex_trylock() sequence will never acquire the lock if > > > some > > > other thread calls lock() at the same time. > > > > > > If multiple processes call lock(), the "winner" of the lock is > > > random. > > > Thus in a way this change actually adds some predictablity: the > > > uxlsnr > > > will step back if some other process is actively trying to grab the > > > lock. IMO that the right thing to do in almost all situations. > > > > > > We don't need to worry about "thundering herd" issues because the > > > number of threads that might wait on the lock is rather small. In > > > the > > > worst case, 3 threads (checker, dmevents handler and uevent > > > dispatcher, > > > plus the uxlsnr in ppoll()) wait for the lock at the same time. > > > Usually > > > one of them will have it grabbed. On systems that lack dmevent > > > polling, > > > the number of waiter threads may be higher, but AFAICS it's a very > > > rare > > > condition to have hundreds of dmevents delivered to different maps > > > simultaneously, and if it happens, it's probably correct to have > > > them > > > serviced quickly. > > > > > > The uevent dispatcher doesn't hold the lock, it's taken and > > > released > > > for every event handled. Thus uxlsnr has a real chance to jump in > > > between uevents. The same holds for the dmevents thread, it takes > > > the > > > lock separately for every map affected. The only piece of code that > > > holds the lock for an extended period of time (except > > > reconfigure(), > > > where it's unavoidable) is the path checker (that's bad, and next > > > on > > > the todo list). > > > > > > The really "important" commands (shutdown, reconfigure) don't take > > > the > > > lock and return immediately; the lock is no issue for them. I don't > > > see > > > any other cli command that needs to be served before uevents or dm > > > events. > > > > > > I haven't been able to test this on huge configurations with 1000s > > > of > > > LUNs, but I tested with artificial delays in checker loop, uevent > > > handlers, and dmevent handler, and lots of clients querying the > > > daemon > > > in parallel, and saw that clients were handled very nicely. Some > > > timeouts are inevitable (e.g. if the checker simply holds the lock > > > longer than the uxsock_timeout), but that is no regression. > > > > > > Bottom line: I believe that because this patch reduces the busy- > > > wait > > > time, clients will be served more reliably and more quickly than > > > before > > > (more precisely: both average and standard deviation of the service > > > delay will be improved wrt before, and timeouts occur less > > > frequently). > > > I encourage everyone to experiment and see if reality shows that > > > I'm > > > wrong. > > > > > > > I suppose the only way to deal with that would be to move the > > > > locking > > > > commands to a list handled by a separate thread, so that it could > > > > block > > > > without stalling the non-locking commands. > > > > > > Not sure if I understand correctly, just in case: non-locking > > > commands > > > are never stalled with my patch. > > > > I realize. I was saying that you could avoid starvation while still > > allowing non-locking commands to complete by moving the locking > > commands > > to a seperate thread, which did block on the lock. I didn't consider > > a > > ticketing system. Ideally, the checker loop would have the lowest > > priority, Since it isn't responding to any event, and ususally is > > just > > verifiying that nothing has changed. But you do make a good point > > that > > when we are getting a lot of events, and the uxlsnr loop has a chance > > of > > getting starved, we probably want to prioritize the event handling > > anyway. > > > > I have also thought about using additional threads for handling cli > commands. One could either use a single thread, similar to the udev > listener/dispatcher pair (your suggestion IIUC), or one thread per > (blocking) client. > > Moving client handling into separate thread(s) avoids the complexity of > the state machine and the eventfd-based wakeup. But on the back side, > it introduces new multithreading-related complexity (of which we > already have our fair share). Client tasks running lock(&vecs->lock) in > order to serve commands like "multipathd show paths" might now starve > event handling, which would be worse than vice versa, IMO. > > Eventually, I found the idea of the poll/wakeup loop with no additional > threads more appealing, and more suitable for the task. But I admit > that it's a matter of personal taste. I tend to try to use pthreads as > little as possible ;-). > > So how do we proceed? I think your argument that we'll only risk starving the uxlsnr thread when it makes sense to prioritize other threads is a good one. So, I'm o.k. with this trylock() solution. -Ben > > Regards, > Martin > -- dm-devel mailing list dm-devel@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/dm-devel