+ mm-list_lru-fix-possible-race-in-memcg_reparent_list_lru_node.patch added to -mm tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The patch titled
     Subject: mm/list_lru: fix possible race in memcg_reparent_list_lru_node()
has been added to the -mm tree.  Its filename is
     mm-list_lru-fix-possible-race-in-memcg_reparent_list_lru_node.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/mm-list_lru-fix-possible-race-in-memcg_reparent_list_lru_node.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/mm-list_lru-fix-possible-race-in-memcg_reparent_list_lru_node.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Waiman Long <longman@xxxxxxxxxx>
Subject: mm/list_lru: fix possible race in memcg_reparent_list_lru_node()

Muchun Song found out there could be a race between list_lru_add() and
memcg_reparent_list_lru_node() causing the later function to miss
reparenting of a lru entry as shown below:

CPU0:                           CPU1:
list_lru_add()
    spin_lock(&nlru->lock)
    l = list_lru_from_kmem(memcg)
                                memcg_reparent_objcgs(memcg)
                                memcg_reparent_list_lrus(memcg)
                                    memcg_reparent_list_lru()
                                        memcg_reparent_list_lru_node()
                                            if (!READ_ONCE(nlru->nr_items))
                                                // Miss reparenting
                                                return
    // Assume 0->1
    l->nr_items++
    // Assume 0->1
    nlru->nr_items++

Though it is not likely that a list_lru_node that has 0 item suddenly has
a newly added lru entry at the end of its life.  The race is still
theoretically possible.

With the lock/unlock pair used within the percpu_ref_kill() which is the
last function call of memcg_reparent_objcgs(), any read issued in
memcg_reparent_list_lru_node() will not be reordered before the
reparenting of objcgs.

Adding a !spin_is_locked()/smp_rmb()/!READ_ONCE(nlru->nr_items) check to
ensure that either the reading of nr_items is valid or the racing
list_lru_add() will see the reparented objcg.

Link: https://lkml.kernel.org/r/20220330172646.2687555-1-longman@xxxxxxxxxx
Fixes: 405cc51fc104 ("mm/list_lru: optimize memcg_reparent_list_lru_node()")
Reported-by: Muchun Song <songmuchun@xxxxxxxxxxxxx>
Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
Acked-by: Roman Gushchin <roman.gushchin@xxxxxxxxx>
Cc: Michal Hocko <mhocko@xxxxxxxx>
Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
Cc: Shakeel Butt <shakeelb@xxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 mm/list_lru.c |   31 +++++++++++++++++++++++++++----
 1 file changed, 27 insertions(+), 4 deletions(-)

--- a/mm/list_lru.c~mm-list_lru-fix-possible-race-in-memcg_reparent_list_lru_node
+++ a/mm/list_lru.c
@@ -395,10 +395,33 @@ static void memcg_reparent_list_lru_node
 	struct list_lru_one *src, *dst;
 
 	/*
-	 * If there is no lru entry in this nlru, we can skip it immediately.
+	 * With the lock/unlock pair used within the percpu_ref_kill()
+	 * which is the last function call of memcg_reparent_objcgs(), any
+	 * read issued here will not be reordered before the reparenting
+	 * of objcgs.
+	 *
+	 * Assuming a racing list_lru_add():
+	 * list_lru_add()
+	 *				<- memcg_reparent_list_lru_node()
+	 *   spin_lock(&nlru->lock)
+	 *   l = list_lru_from_kmem(memcg)
+	 *   nlru->nr_items++
+	 *   spin_unlock(&nlru->lock)
+	 *				<- memcg_reparent_list_lru_node()
+	 *
+	 * The !spin_is_locked(&nlru->lock) check is true means it is
+	 * either before the spin_lock() or after the spin_unlock(). In the
+	 * former case, list_lru_add() will see the reparented objcg and so
+	 * won't touch the lru to be reparented. In the later case, it will
+	 * see the updated nr_items. So we can use the optimization that if
+	 * there is no lru entry in this nlru, skip it immediately.
 	 */
-	if (!READ_ONCE(nlru->nr_items))
-		return;
+	if (!spin_is_locked(&nlru->lock)) {
+		/* nr_items read must be ordered after nlru->lock */
+		smp_rmb();
+		if (!READ_ONCE(nlru->nr_items))
+			return;
+	}
 
 	/*
 	 * Since list_lru_{add,del} may be called under an IRQ-safe lock,
@@ -407,7 +430,7 @@ static void memcg_reparent_list_lru_node
 	spin_lock_irq(&nlru->lock);
 
 	src = list_lru_from_memcg_idx(lru, nid, src_idx);
-	if (!src)
+	if (!src || !src->nr_items)
 		goto out;
 	dst = list_lru_from_memcg_idx(lru, nid, dst_idx);
 
_

Patches currently in -mm which might be from longman@xxxxxxxxxx are

mm-sparsemem-fix-mem_section-will-never-be-null-gcc-12-warning.patch
mm-list_lru-fix-possible-race-in-memcg_reparent_list_lru_node.patch
ipc-mqueue-use-get_tree_nodev-in-mqueue_get_tree.patch




[Index of Archives]     [Kernel Archive]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux