On 9/19/19 5:24 AM, hev wrote: > From: Heiher <r@xxxxxx> > > Take the case where we have: > > t0 > | (ew) > e0 > | (et) > e1 > | (lt) > s0 > > t0: thread 0 > e0: epoll fd 0 > e1: epoll fd 1 > s0: socket fd 0 > ew: epoll_wait > et: edge-trigger > lt: level-trigger > > When s0 fires an event, e1 catches the event, and then e0 catches an event from > e1. After this, There is a thread t0 do epoll_wait() many times on e0, it should > only get one event in total, because e1 is a dded to e0 in edge-triggered mode. > > This patch only allows the wakeup(&ep->poll_wait) in ep_scan_ready_list under > two conditions: > > 1. depth == 0. > 2. There have event is added to ep->ovflist during processing. > > Test code: > #include <unistd.h> > #include <sys/epoll.h> > #include <sys/socket.h> > > int main(int argc, char *argv[]) > { > int sfd[2]; > int efd[2]; > struct epoll_event e; > > if (socketpair(AF_UNIX, SOCK_STREAM, 0, sfd) < 0) > goto out; > > efd[0] = epoll_create(1); > if (efd[0] < 0) > goto out; > > efd[1] = epoll_create(1); > if (efd[1] < 0) > goto out; > > e.events = EPOLLIN; > if (epoll_ctl(efd[1], EPOLL_CTL_ADD, sfd[0], &e) < 0) > goto out; > > e.events = EPOLLIN | EPOLLET; > if (epoll_ctl(efd[0], EPOLL_CTL_ADD, efd[1], &e) < 0) > goto out; > > if (write(sfd[1], "w", 1) != 1) > goto out; > > if (epoll_wait(efd[0], &e, 1, 0) != 1) > goto out; > > if (epoll_wait(efd[0], &e, 1, 0) != 0) > goto out; > > close(efd[0]); > close(efd[1]); > close(sfd[0]); > close(sfd[1]); > > return 0; > > out: > return -1; > } > > More tests: > https://github.com/heiher/epoll-wakeup > > Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx> > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > Cc: Davide Libenzi <davidel@xxxxxxxxxxxxxxx> > Cc: Davidlohr Bueso <dave@xxxxxxxxxxxx> > Cc: Dominik Brodowski <linux@xxxxxxxxxxxxxxxxxxxx> > Cc: Eric Wong <e@xxxxxxxxx> > Cc: Jason Baron <jbaron@xxxxxxxxxx> > Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> > Cc: Roman Penyaev <rpenyaev@xxxxxxx> > Cc: Sridhar Samudrala <sridhar.samudrala@xxxxxxxxx> > Cc: linux-kernel@xxxxxxxxxxxxxxx > Cc: linux-fsdevel@xxxxxxxxxxxxxxx > Signed-off-by: hev <r@xxxxxx> > --- > fs/eventpoll.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/fs/eventpoll.c b/fs/eventpoll.c > index c4159bcc05d9..fa71468dbd51 100644 > --- a/fs/eventpoll.c > +++ b/fs/eventpoll.c > @@ -685,6 +685,9 @@ static __poll_t ep_scan_ready_list(struct eventpoll *ep, > if (!ep_locked) > mutex_lock_nested(&ep->mtx, depth); > > + if (!depth || list_empty_careful(&ep->rdllist)) > + pwake++; > + > /* > * Steal the ready list, and re-init the original one to the > * empty list. Also, set ep->ovflist to NULL so that events > @@ -755,7 +758,7 @@ static __poll_t ep_scan_ready_list(struct eventpoll *ep, > mutex_unlock(&ep->mtx); > > /* We have to call this outside the lock */ > - if (pwake) > + if (pwake == 2) > ep_poll_safewake(&ep->poll_wait); > > return res; > Hi, I was thinking more like the following. I tried it using your test-suite and it seems to work. What do you think? Thanks, -Jason diff --git a/fs/eventpoll.c b/fs/eventpoll.c index d7f1f50..662136b 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -712,6 +712,15 @@ static __poll_t ep_scan_ready_list(struct eventpoll *ep, for (nepi = READ_ONCE(ep->ovflist); (epi = nepi) != NULL; nepi = epi->next, epi->next = EP_UNACTIVE_PTR) { /* + * We only need to wakeup nested epoll fds if + * if something has been queued to the overflow list, + * since the ep_poll() traverses the rdllist during + * recursive poll and thus events on the overflow list + * may not be visible yet. + */ + if (!pwake) + pwake++; + /* * We need to check if the item is already in the list. * During the "sproc" callback execution time, items are * queued into ->ovflist but the "txlist" might already @@ -755,7 +764,7 @@ static __poll_t ep_scan_ready_list(struct eventpoll *ep, mutex_unlock(&ep->mtx); /* We have to call this outside the lock */ - if (pwake) + if (pwake == 2) ep_poll_safewake(&ep->poll_wait); return res;