The patch titled epoll: use unlocked wqueue operations has been added to the -mm tree. Its filename is epoll-use-unlocked-wqueue-operations.patch See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find out what to do about this ------------------------------------------------------ Subject: epoll: use unlocked wqueue operations From: Davide Libenzi <davidel@xxxxxxxxxxxxxxx> A few days ago Arjan signaled a lockdep red flag on epoll locks, and precisely between the epoll's device structure lock (->lock) and the wait queue head lock (->lock). Like I explained in another email, and directly to Arjan, this can't happen in reality because of the explicit check at eventpoll.c:592, that does not allow to drop an epoll fd inside the same epoll fd. Since lockdep is working on per-structure locks, it will never be able to know of policies enforced in other parts of the code. It was decided time ago of having the ability to drop epoll fds inside other epoll fds, that triggers a very trick wakeup operations (due to possibly reentrant callback-driven wakeups) handled by the ep_poll_safewake() function. While looking again at the code though, I noticed that all the operations done on the epoll's main structure wait queue head (->wq) are already protected by the epoll lock (->lock), so that locked-style functions can be used to manipulate the ->wq member. This makes both a lock-acquire save, and lockdep happy. Running totalmess on my dual opteron for a while did not reveal any problem so far: http://www.xmailserver.org/totalmess.c Signed-off-by: Davide Libenzi <davidel@xxxxxxxxxxxxxxx> Cc: Arjan van de Ven <arjan@xxxxxxxxxxxxxxx> Cc: Ingo Molnar <mingo@xxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxx> --- fs/eventpoll.c | 17 ++++++++++------- include/linux/eventpoll.h | 2 +- 2 files changed, 11 insertions(+), 8 deletions(-) diff -puN fs/eventpoll.c~epoll-use-unlocked-wqueue-operations fs/eventpoll.c --- devel/fs/eventpoll.c~epoll-use-unlocked-wqueue-operations 2006-06-02 18:13:48.000000000 -0700 +++ devel-akpm/fs/eventpoll.c 2006-06-02 18:13:48.000000000 -0700 @@ -1,6 +1,6 @@ /* * fs/eventpoll.c ( Efficent event polling implementation ) - * Copyright (C) 2001,...,2003 Davide Libenzi + * Copyright (C) 2001,...,2006 Davide Libenzi * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by @@ -1004,7 +1004,7 @@ static int ep_insert(struct eventpoll *e /* Notify waiting tasks that events are available */ if (waitqueue_active(&ep->wq)) - wake_up(&ep->wq); + __wake_up_locked(&ep->wq, TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE); if (waitqueue_active(&ep->poll_wait)) pwake++; } @@ -1083,7 +1083,8 @@ static int ep_modify(struct eventpoll *e /* Notify waiting tasks that events are available */ if (waitqueue_active(&ep->wq)) - wake_up(&ep->wq); + __wake_up_locked(&ep->wq, TASK_UNINTERRUPTIBLE | + TASK_INTERRUPTIBLE); if (waitqueue_active(&ep->poll_wait)) pwake++; } @@ -1260,7 +1261,8 @@ is_linked: * wait list. */ if (waitqueue_active(&ep->wq)) - wake_up(&ep->wq); + __wake_up_locked(&ep->wq, TASK_UNINTERRUPTIBLE | + TASK_INTERRUPTIBLE); if (waitqueue_active(&ep->poll_wait)) pwake++; @@ -1444,7 +1446,8 @@ static void ep_reinject_items(struct eve * wait list. */ if (waitqueue_active(&ep->wq)) - wake_up(&ep->wq); + __wake_up_locked(&ep->wq, TASK_UNINTERRUPTIBLE | + TASK_INTERRUPTIBLE); if (waitqueue_active(&ep->poll_wait)) pwake++; } @@ -1516,7 +1519,7 @@ retry: * ep_poll_callback() when events will become available. */ init_waitqueue_entry(&wait, current); - add_wait_queue(&ep->wq, &wait); + __add_wait_queue(&ep->wq, &wait); for (;;) { /* @@ -1536,7 +1539,7 @@ retry: jtimeout = schedule_timeout(jtimeout); write_lock_irqsave(&ep->lock, flags); } - remove_wait_queue(&ep->wq, &wait); + __remove_wait_queue(&ep->wq, &wait); set_current_state(TASK_RUNNING); } diff -puN include/linux/eventpoll.h~epoll-use-unlocked-wqueue-operations include/linux/eventpoll.h --- devel/include/linux/eventpoll.h~epoll-use-unlocked-wqueue-operations 2006-06-02 18:13:48.000000000 -0700 +++ devel-akpm/include/linux/eventpoll.h 2006-06-02 18:13:48.000000000 -0700 @@ -1,6 +1,6 @@ /* * include/linux/eventpoll.h ( Efficent event polling implementation ) - * Copyright (C) 2001,...,2003 Davide Libenzi + * Copyright (C) 2001,...,2006 Davide Libenzi * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by _ Patches currently in -mm which might be from davidel@xxxxxxxxxxxxxxx are epoll-use-unlocked-wqueue-operations.patch - To unsubscribe from this list: send the line "unsubscribe mm-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html