[nacked] epoll-fix-for-epoll_wait-sometimes-returning-events-on-closed-fds.patch removed from -mm tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The patch titled
     epoll: fix for epoll_wait() sometimes returning events on closed fds
has been removed from the -mm tree.  Its filename was
     epoll-fix-for-epoll_wait-sometimes-returning-events-on-closed-fds.patch

This patch was dropped because it was nacked

The current -mm tree may be found at http://userweb.kernel.org/~akpm/mmotm/

------------------------------------------------------
Subject: epoll: fix for epoll_wait() sometimes returning events on closed fds
From: Tony Battersby <tonyb@xxxxxxxxxxxxxxx>

The epoll manpage:

Q: Will closing a file descriptor cause it to be removed from all
   epoll sets automatically?
A: Yes

sys_close() calls filp_close(), which calls fput().  If no one else holds
a reference to the file, then fput() calls __fput(), and __fput() calls
eventpoll_release(), which prevents epoll_wait() from returning events on
the fd to userspace.  In the rare case that sys_close() doesn't call
__fput() because someone else has a reference to the file, a subsequent
epoll_wait() may still unexpectedly return events on the fd after it has
been closed.  This can end up confusing or crashing a userspace program
that doesn't do epoll_ctl(EPOLL_CTL_DEL) before closing the fd.  I have
reports of this actually happening under heavy load with a program using
epoll with network sockets on 2.6.27.

This patch fixes the problem by calling eventpoll_release_file() from
filp_close() instead of from __fput().

The locking in eventpoll_release() and eventpoll_release_file() needs to
be changed because previously it relied on the fact that no one else could
have a reference to the file when called from __fput(), and this is no
longer true.  The new locking is admittedly ugly, but I believe it works.

ep_insert() now also needs to check if the file has been closed to avoid
races in multi-threaded programs where one thread is doing
epoll_ctl(EPOLL_CTL_ADD) and another thread is closing the fd.  This is
done by checking that fget_light still returns the same struct file * as
before.

Note that the list_del_init(&epi->fllink) previously done in
eventpoll_release_file() was unnecessary because it is also done by
ep_remove().

Userspace programs that might run on kernels with this bug can work around
the problem by doing epoll_ctl(EPOLL_CTL_DEL) before close().

Signed-off-by: Tony Battersby <tonyb@xxxxxxxxxxxxxxx>
Cc: <stable@xxxxxxxxxx>
Cc: Michael Kerrisk <mtk.manpages@xxxxxxxxxxxxxx>
Cc: Davide Libenzi <davidel@xxxxxxxxxxxxxxx>
Cc: Jonathan Corbet <corbet@xxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 fs/eventpoll.c            |   66 ++++++++++++++++++++++++++++--------
 fs/file_table.c           |    5 --
 fs/open.c                 |    2 +
 include/linux/eventpoll.h |   16 ++++----
 4 files changed, 64 insertions(+), 25 deletions(-)

diff -puN fs/eventpoll.c~epoll-fix-for-epoll_wait-sometimes-returning-events-on-closed-fds fs/eventpoll.c
--- a/fs/eventpoll.c~epoll-fix-for-epoll_wait-sometimes-returning-events-on-closed-fds
+++ a/fs/eventpoll.c
@@ -538,27 +538,40 @@ void eventpoll_release_file(struct file 
 	struct epitem *epi;
 
 	/*
-	 * We don't want to get "file->f_ep_lock" because it is not
-	 * necessary. It is not necessary because we're in the "struct file"
-	 * cleanup path, and this means that noone is using this file anymore.
-	 * So, for example, epoll_ctl() cannot hit here sicne if we reach this
-	 * point, the file counter already went to zero and fget() would fail.
-	 * The only hit might come from ep_free() but by holding the mutex
-	 * will correctly serialize the operation. We do need to acquire
-	 * "ep->mtx" after "epmutex" because ep_remove() requires it when called
-	 * from anywhere but ep_free().
+	 * epmutex makes it safe to use *ep by preventing ep_free() from
+	 * running.
 	 */
 	mutex_lock(&epmutex);
 
+	spin_lock(&file->f_ep_lock);
 	while (!list_empty(lsthead)) {
-		epi = list_first_entry(lsthead, struct epitem, fllink);
+		ep = list_first_entry(lsthead, struct epitem, fllink)->ep;
+		spin_unlock(&file->f_ep_lock);
 
-		ep = epi->ep;
-		list_del_init(&epi->fllink);
+		/*
+		 * ep->mtx protects against epoll_ctl() and is required for
+		 * calling ep_remove().
+		 */
 		mutex_lock(&ep->mtx);
-		ep_remove(ep, epi);
+
+		spin_lock(&file->f_ep_lock);
+		epi = list_first_entry(lsthead, struct epitem, fllink);
+		spin_unlock(&file->f_ep_lock);
+
+		/*
+		 * epoll_ctl(EPOLL_CTL_DEL) may have modified the list before
+		 * we locked ep->mtx, so it is necessary to check that the
+		 * first entry still exists and belongs to the ep whose mtx we
+		 * are holding.
+		 */
+		if (epi && epi->ep == ep)
+			ep_remove(ep, epi);
+
 		mutex_unlock(&ep->mtx);
+
+		spin_lock(&file->f_ep_lock);
 	}
+	spin_unlock(&file->f_ep_lock);
 
 	mutex_unlock(&epmutex);
 }
@@ -736,6 +749,22 @@ static void ep_rbtree_insert(struct even
 }
 
 /*
+ * Returns true if the fd has been closed or false if not.
+ */
+static bool ep_is_file_closed(struct file *file, int fd)
+{
+	struct file *file2;
+	int needs_fput;
+	bool closed;
+
+	file2 = fget_light(fd, &needs_fput);
+	closed = (file2 != file);
+	if (file2)
+		fput_light(file2, needs_fput);
+	return closed;
+}
+
+/*
  * Must be called with "mtx" held.
  */
 static int ep_insert(struct eventpoll *ep, struct epoll_event *event,
@@ -786,6 +815,17 @@ static int ep_insert(struct eventpoll *e
 
 	/* Add the current item to the list of active epoll hook for this file */
 	spin_lock(&tfile->f_ep_lock);
+	/*
+	 * Don't add the fd to the epoll set if it was closed in between the
+	 * initial fget() and the time we get here.  This check must be done
+	 * while holding f_ep_lock to prevent races with
+	 * eventpoll_release_file().
+	 */
+	if (ep_is_file_closed(tfile, fd)) {
+		spin_unlock(&tfile->f_ep_lock);
+		error = -EBADF;
+		goto error_unregister;
+	}
 	list_add_tail(&epi->fllink, &tfile->f_ep_links);
 	spin_unlock(&tfile->f_ep_lock);
 
diff -puN fs/file_table.c~epoll-fix-for-epoll_wait-sometimes-returning-events-on-closed-fds fs/file_table.c
--- a/fs/file_table.c~epoll-fix-for-epoll_wait-sometimes-returning-events-on-closed-fds
+++ a/fs/file_table.c
@@ -266,11 +266,6 @@ void __fput(struct file *file)
 	might_sleep();
 
 	fsnotify_close(file);
-	/*
-	 * The function eventpoll_release() should be the first called
-	 * in the file cleanup chain.
-	 */
-	eventpoll_release(file);
 	locks_remove_flock(file);
 
 	if (unlikely(file->f_flags & FASYNC)) {
diff -puN fs/open.c~epoll-fix-for-epoll_wait-sometimes-returning-events-on-closed-fds fs/open.c
--- a/fs/open.c~epoll-fix-for-epoll_wait-sometimes-returning-events-on-closed-fds
+++ a/fs/open.c
@@ -29,6 +29,7 @@
 #include <linux/rcupdate.h>
 #include <linux/audit.h>
 #include <linux/falloc.h>
+#include <linux/eventpoll.h>
 
 int vfs_statfs(struct dentry *dentry, struct kstatfs *buf)
 {
@@ -1104,6 +1105,7 @@ int filp_close(struct file *filp, fl_own
 
 	dnotify_flush(filp, id);
 	locks_remove_posix(filp, id);
+	eventpoll_release(filp);
 	fput(filp);
 	return retval;
 }
diff -puN include/linux/eventpoll.h~epoll-fix-for-epoll_wait-sometimes-returning-events-on-closed-fds include/linux/eventpoll.h
--- a/include/linux/eventpoll.h~epoll-fix-for-epoll_wait-sometimes-returning-events-on-closed-fds
+++ a/include/linux/eventpoll.h
@@ -69,23 +69,25 @@ static inline void eventpoll_init_file(s
 void eventpoll_release_file(struct file *file);
 
 /*
- * This is called from inside fs/file_table.c:__fput() to unlink files
+ * This is called from inside fs/open.c:filp_close() to unlink files
  * from the eventpoll interface. We need to have this facility to cleanup
  * correctly files that are closed without being removed from the eventpoll
  * interface.
  */
 static inline void eventpoll_release(struct file *file)
 {
+	bool empty;
 
 	/*
-	 * Fast check to avoid the get/release of the semaphore. Since
-	 * we're doing this outside the semaphore lock, it might return
+	 * Fast check to avoid the get/release of the mutex. Since
+	 * we're doing this outside the mutex lock, it might return
 	 * false negatives, but we don't care. It'll help in 99.99% of cases
-	 * to avoid the semaphore lock. False positives simply cannot happen
-	 * because the file in on the way to be removed and nobody ( but
-	 * eventpoll ) has still a reference to this file.
+	 * to avoid the mutex lock.
 	 */
-	if (likely(list_empty(&file->f_ep_links)))
+	spin_lock(&file->f_ep_lock);
+	empty = list_empty(&file->f_ep_links);
+	spin_unlock(&file->f_ep_lock);
+	if (likely(empty))
 		return;
 
 	/*
_

Patches currently in -mm which might be from tonyb@xxxxxxxxxxxxxxx are

linux-next.patch
vfs-improve-comment-describing-fget_light.patch
epoll-fix-for-epoll_wait-sometimes-returning-events-on-closed-fds.patch
make-shm_get_stat-more-robust.patch

--
To unsubscribe from this list: send the line "unsubscribe mm-commits" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Kernel Newbies FAQ]     [Kernel Archive]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Photo]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux