Patch "rcu: Back off upon fill_page_cache_func() allocation failure" has been added to the 5.15-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    rcu: Back off upon fill_page_cache_func() allocation failure

to the 5.15-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     rcu-back-off-upon-fill_page_cache_func-allocation-fa.patch
and it can be found in the queue-5.15 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit fec7f1370335e581a69364b3643c5d0406717771
Author: Michal Hocko <mhocko@xxxxxxxx>
Date:   Wed Jun 22 13:47:11 2022 +0200

    rcu: Back off upon fill_page_cache_func() allocation failure
    
    [ Upstream commit 093590c16b447f53e66771c8579ae66c96f6ef61 ]
    
    The fill_page_cache_func() function allocates couple of pages to store
    kvfree_rcu_bulk_data structures. This is a lightweight (GFP_NORETRY)
    allocation which can fail under memory pressure. The function will,
    however keep retrying even when the previous attempt has failed.
    
    This retrying is in theory correct, but in practice the allocation is
    invoked from workqueue context, which means that if the memory reclaim
    gets stuck, these retries can hog the worker for quite some time.
    Although the workqueues subsystem automatically adjusts concurrency, such
    adjustment is not guaranteed to happen until the worker context sleeps.
    And the fill_page_cache_func() function's retry loop is not guaranteed
    to sleep (see the should_reclaim_retry() function).
    
    And we have seen this function cause workqueue lockups:
    
    kernel: BUG: workqueue lockup - pool cpus=93 node=1 flags=0x1 nice=0 stuck for 32s!
    [...]
    kernel: pool 74: cpus=37 node=0 flags=0x1 nice=0 hung=32s workers=2 manager: 2146
    kernel:   pwq 498: cpus=249 node=1 flags=0x1 nice=0 active=4/256 refcnt=5
    kernel:     in-flight: 1917:fill_page_cache_func
    kernel:     pending: dbs_work_handler, free_work, kfree_rcu_monitor
    
    Originally, we thought that the root cause of this lockup was several
    retries with direct reclaim, but this is not yet confirmed.  Furthermore,
    we have seen similar lockups without any heavy memory pressure.  This
    suggests that there are other factors contributing to these lockups.
    However, it is not really clear that endless retries are desireable.
    
    So let's make the fill_page_cache_func() function back off after
    allocation failure.
    
    Cc: Uladzislau Rezki (Sony) <urezki@xxxxxxxxx>
    Cc: "Paul E. McKenney" <paulmck@xxxxxxxxxx>
    Cc: Frederic Weisbecker <frederic@xxxxxxxxxx>
    Cc: Neeraj Upadhyay <quic_neeraju@xxxxxxxxxxx>
    Cc: Josh Triplett <josh@xxxxxxxxxxxxxxxx>
    Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
    Cc: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
    Cc: Lai Jiangshan <jiangshanlai@xxxxxxxxx>
    Cc: Joel Fernandes <joel@xxxxxxxxxxxxxxxxx>
    Signed-off-by: Michal Hocko <mhocko@xxxxxxxx>
    Reviewed-by: Uladzislau Rezki (Sony) <urezki@xxxxxxxxx>
    Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index a4a9d68b1fdc..63f7ce228cc3 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3419,15 +3419,16 @@ static void fill_page_cache_func(struct work_struct *work)
 		bnode = (struct kvfree_rcu_bulk_data *)
 			__get_free_page(GFP_KERNEL | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN);
 
-		if (bnode) {
-			raw_spin_lock_irqsave(&krcp->lock, flags);
-			pushed = put_cached_bnode(krcp, bnode);
-			raw_spin_unlock_irqrestore(&krcp->lock, flags);
+		if (!bnode)
+			break;
 
-			if (!pushed) {
-				free_page((unsigned long) bnode);
-				break;
-			}
+		raw_spin_lock_irqsave(&krcp->lock, flags);
+		pushed = put_cached_bnode(krcp, bnode);
+		raw_spin_unlock_irqrestore(&krcp->lock, flags);
+
+		if (!pushed) {
+			free_page((unsigned long) bnode);
+			break;
 		}
 	}
 



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux