On 10/17/2017 11:37 AM, Davidlohr Bueso wrote: > On Fri, 13 Oct 2017, Jason Baron wrote: > >> The ep_poll_safewake() function is used to wakeup potentially nested >> epoll >> file descriptors. The function uses ep_call_nested() to prevent entering >> the same wake up queue more than once, and to prevent excessively deep >> wakeup paths (deeper than EP_MAX_NESTS). However, this is not necessary >> since we are already preventing these conditions during EPOLL_CTL_ADD. >> This >> saves extra function calls, and avoids taking a global lock during the >> ep_call_nested() calls. > > This makes sense. > >> >> I have, however, left ep_call_nested() for the CONFIG_DEBUG_LOCK_ALLOC >> case, since ep_call_nested() keeps track of the nesting level, and >> this is >> required by the call to spin_lock_irqsave_nested(). It would be nice to >> remove the ep_call_nested() calls for the CONFIG_DEBUG_LOCK_ALLOC case as >> well, however its not clear how to simply pass the nesting level through >> multiple wake_up() levels without more surgery. In any case, I don't >> think >> CONFIG_DEBUG_LOCK_ALLOC is generally used for production. This patch, >> also >> apparently fixes a workload at Google that Salman Qazi reported by >> completely removing the poll_safewake_ncalls->lock from wakeup paths. > > I'm a bit curious about the workload (which uses lots of EPOLL_CTL_ADDs) as > I was tackling the nested epoll scaling issue with loop and readywalk lists > in mind. >> I'm not sure the details of the workload - perhaps Salman can elaborate further about it. It would seem that the safewake would potentially be the most contended in general in the nested case, because generally you have a few epoll fds attached to lots of sources doing wakeups. So those sources are all going to conflict on the safewake lock. The readywalk is used when performing a 'nested' poll and in general this is likely going to be called on a few epoll fds. That said, we should remove it too. I will post a patch to remove it. The loop lock is used during EPOLL_CTL_ADD to check for loops and deep wakeup paths and so I would expect this to be less common, but I wouldn't doubt there are workloads impacted by it. We can potentially, I think remove this one too - and the global 'epmutex'. I posted some ideas a while ago on it: http://lkml.iu.edu/hypermail//linux/kernel/1501.1/05905.html We can work through these ideas or others... Thanks, -Jason >> Signed-off-by: Jason Baron <jbaron@xxxxxxxxxx> >> Cc: Alexander Viro <viro@xxxxxxxxxxxxxxxxxx> >> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> >> Cc: Salman Qazi <sqazi@xxxxxxxxxx> > > Acked-by: Davidlohr Bueso <dbueso@xxxxxxx> > >> --- >> fs/eventpoll.c | 47 ++++++++++++++++++----------------------------- >> 1 file changed, 18 insertions(+), 29 deletions(-) > > Yay for getting rid of some of the callback hell. > > Thanks, > Davidlohr