Patch "xfs: explicitly specify cpu when forcing inodegc delayed work to run immediately" has been added to the 5.15-stable tree

Sasha Levin <sashal@xxxxxxxxxx> · Mon, 25 Sep 2023 15:12:01 -0400

This is a note to let you know that I've just added the patch titled

    xfs: explicitly specify cpu when forcing inodegc delayed work to run immediately

to the 5.15-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     xfs-explicitly-specify-cpu-when-forcing-inodegc-dela.patch
and it can be found in the queue-5.15 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit b210ea2535b4205d2e826b1d6d4b82d4d8917c79
Author: Darrick J. Wong <djwong@xxxxxxxxxx>
Date:   Thu Sep 21 18:01:53 2023 -0700

    xfs: explicitly specify cpu when forcing inodegc delayed work to run immediately
    
    [ Upstream commit 03e0add80f4cf3f7393edb574eeb3a89a1db7758 ]
    
    I've been noticing odd racing behavior in the inodegc code that could
    only be explained by one cpu adding an inode to its inactivation llist
    at the same time that another cpu is processing that cpu's llist.
    Preemption is disabled between get/put_cpu_ptr, so the only explanation
    is scheduler mayhem.  I inserted the following debug code into
    xfs_inodegc_worker (see the next patch):
    
            ASSERT(gc->cpu == smp_processor_id());
    
    This assertion tripped during overnight tests on the arm64 machines, but
    curiously not on x86_64.  I think we haven't observed any resource leaks
    here because the lockfree list code can handle simultaneous llist_add
    and llist_del_all functions operating on the same list.  However, the
    whole point of having percpu inodegc lists is to take advantage of warm
    memory caches by inactivating inodes on the last processor to touch the
    inode.
    
    The incorrect scheduling seems to occur after an inodegc worker is
    subjected to mod_delayed_work().  This wraps mod_delayed_work_on with
    WORK_CPU_UNBOUND specified as the cpu number.  Unbound allows for
    scheduling on any cpu, not necessarily the same one that scheduled the
    work.
    
    Because preemption is disabled for as long as we have the gc pointer, I
    think it's safe to use current_cpu() (aka smp_processor_id) to queue the
    delayed work item on the correct cpu.
    
    Fixes: 7cf2b0f9611b ("xfs: bound maximum wait time for inodegc work")
    Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx>
    Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Signed-off-by: Dave Chinner <david@xxxxxxxxxxxxx>
    Signed-off-by: Leah Rumancik <leah.rumancik@xxxxxxxxx>
    Acked-by: Darrick J. Wong <djwong@xxxxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index e9ebfe6f80150..ab8181f8d08a9 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -2057,7 +2057,8 @@ xfs_inodegc_queue(
 		queue_delay = 0;
 
 	trace_xfs_inodegc_queue(mp, __return_address);
-	mod_delayed_work(mp->m_inodegc_wq, &gc->work, queue_delay);
+	mod_delayed_work_on(current_cpu(), mp->m_inodegc_wq, &gc->work,
+			queue_delay);
 	put_cpu_ptr(gc);
 
 	if (xfs_inodegc_want_flush_work(ip, items, shrinker_hits)) {
@@ -2101,7 +2102,8 @@ xfs_inodegc_cpu_dead(
 
 	if (xfs_is_inodegc_enabled(mp)) {
 		trace_xfs_inodegc_queue(mp, __return_address);
-		mod_delayed_work(mp->m_inodegc_wq, &gc->work, 0);
+		mod_delayed_work_on(current_cpu(), mp->m_inodegc_wq, &gc->work,
+				0);
 	}
 	put_cpu_ptr(gc);
 }