Patch "epoll: autoremove wakers even more aggressively" has been added to the 5.19-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    epoll: autoremove wakers even more aggressively

to the 5.19-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     epoll-autoremove-wakers-even-more-aggressively.patch
and it can be found in the queue-5.19 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit a243ba1c4a932a2513ab0222b5c31d019538b199
Author: Benjamin Segall <bsegall@xxxxxxxxxx>
Date:   Wed Jun 15 14:24:23 2022 -0700

    epoll: autoremove wakers even more aggressively
    
    [ Upstream commit a16ceb13961068f7209e34d7984f8e42d2c06159 ]
    
    If a process is killed or otherwise exits while having active network
    connections and many threads waiting on epoll_wait, the threads will all
    be woken immediately, but not removed from ep->wq.  Then when network
    traffic scans ep->wq in wake_up, every wakeup attempt will fail, and will
    not remove the entries from the list.
    
    This means that the cost of the wakeup attempt is far higher than usual,
    does not decrease, and this also competes with the dying threads trying to
    actually make progress and remove themselves from the wq.
    
    Handle this by removing visited epoll wq entries unconditionally, rather
    than only when the wakeup succeeds - the structure of ep_poll means that
    the only potential loss is the timed_out->eavail heuristic, which now can
    race and result in a redundant ep_send_events attempt.  (But only when
    incoming data and a timeout actually race, not on every timeout)
    
    Shakeel added:
    
    : We are seeing this issue in production with real workloads and it has
    : caused hard lockups.  Particularly network heavy workloads with a lot
    : of threads in epoll_wait() can easily trigger this issue if they get
    : killed (oom-killed in our case).
    
    Link: https://lkml.kernel.org/r/xm26fsjotqda.fsf@xxxxxxxxxx
    Signed-off-by: Ben Segall <bsegall@xxxxxxxxxx>
    Tested-by: Shakeel Butt <shakeelb@xxxxxxxxxx>
    Cc: Alexander Viro <viro@xxxxxxxxxxxxxxxxxx>
    Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
    Cc: Shakeel Butt <shakeelb@xxxxxxxxxx>
    Cc: Eric Dumazet <edumazet@xxxxxxxxxx>
    Cc: Roman Penyaev <rpenyaev@xxxxxxx>
    Cc: Jason Baron <jbaron@xxxxxxxxxx>
    Cc: Khazhismel Kumykov <khazhy@xxxxxxxxxx>
    Cc: Heiher <r@xxxxxx>
    Cc: <stable@xxxxxxxxxx>
    Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index e2daa940ebce..8b56b94e2f56 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1747,6 +1747,21 @@ static struct timespec64 *ep_timeout_to_timespec(struct timespec64 *to, long ms)
 	return to;
 }
 
+/*
+ * autoremove_wake_function, but remove even on failure to wake up, because we
+ * know that default_wake_function/ttwu will only fail if the thread is already
+ * woken, and in that case the ep_poll loop will remove the entry anyways, not
+ * try to reuse it.
+ */
+static int ep_autoremove_wake_function(struct wait_queue_entry *wq_entry,
+				       unsigned int mode, int sync, void *key)
+{
+	int ret = default_wake_function(wq_entry, mode, sync, key);
+
+	list_del_init(&wq_entry->entry);
+	return ret;
+}
+
 /**
  * ep_poll - Retrieves ready events, and delivers them to the caller-supplied
  *           event buffer.
@@ -1828,8 +1843,15 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
 		 * normal wakeup path no need to call __remove_wait_queue()
 		 * explicitly, thus ep->lock is not taken, which halts the
 		 * event delivery.
+		 *
+		 * In fact, we now use an even more aggressive function that
+		 * unconditionally removes, because we don't reuse the wait
+		 * entry between loop iterations. This lets us also avoid the
+		 * performance issue if a process is killed, causing all of its
+		 * threads to wake up without being removed normally.
 		 */
 		init_wait(&wait);
+		wait.func = ep_autoremove_wake_function;
 
 		write_lock_irq(&ep->lock);
 		/*



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux