Patch "mm: memcontrol: fix stat-corrupting race in charge moving" has been added to the 5.4-stable tree

Sasha Levin <sashal@xxxxxxxxxx> · Sun, 27 Sep 2020 13:59:49 -0400

This is a note to let you know that I've just added the patch titled

    mm: memcontrol: fix stat-corrupting race in charge moving

to the 5.4-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     mm-memcontrol-fix-stat-corrupting-race-in-charge-mov.patch
and it can be found in the queue-5.4 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit f8b0eb75a6614755b5c5907dff8b8c8b537433a6
Author: Johannes Weiner <hannes@xxxxxxxxxxx>
Date:   Wed Jun 3 16:01:28 2020 -0700

    mm: memcontrol: fix stat-corrupting race in charge moving
    
    [ Upstream commit abb242f57196dbaa108271575353a0453f6834ef ]
    
    The move_lock is a per-memcg lock, but the VM accounting code that needs
    to acquire it comes from the page and follows page->mem_cgroup under RCU
    protection.  That means that the page becomes unlocked not when we drop
    the move_lock, but when we update page->mem_cgroup.  And that assignment
    doesn't imply any memory ordering.  If that pointer write gets reordered
    against the reads of the page state - page_mapped, PageDirty etc.  the
    state may change while we rely on it being stable and we can end up
    corrupting the counters.
    
    Place an SMP memory barrier to make sure we're done with all page state by
    the time the new page->mem_cgroup becomes visible.
    
    Also replace the open-coded move_lock with a lock_page_memcg() to make it
    more obvious what we're serializing against.
    
    Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx>
    Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
    Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx>
    Reviewed-by: Shakeel Butt <shakeelb@xxxxxxxxxx>
    Cc: Alex Shi <alex.shi@xxxxxxxxxxxxxxxxx>
    Cc: Hugh Dickins <hughd@xxxxxxxxxx>
    Cc: "Kirill A. Shutemov" <kirill@xxxxxxxxxxxxx>
    Cc: Michal Hocko <mhocko@xxxxxxxx>
    Cc: Roman Gushchin <guro@xxxxxx>
    Cc: Balbir Singh <bsingharora@xxxxxxxxx>
    Link: http://lkml.kernel.org/r/20200508183105.225460-3-hannes@xxxxxxxxxxx
    Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 402c8bc65e08d..ca1632850fb76 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5489,7 +5489,6 @@ static int mem_cgroup_move_account(struct page *page,
 {
 	struct lruvec *from_vec, *to_vec;
 	struct pglist_data *pgdat;
-	unsigned long flags;
 	unsigned int nr_pages = compound ? hpage_nr_pages(page) : 1;
 	int ret;
 	bool anon;
@@ -5516,18 +5515,13 @@ static int mem_cgroup_move_account(struct page *page,
 	from_vec = mem_cgroup_lruvec(pgdat, from);
 	to_vec = mem_cgroup_lruvec(pgdat, to);
 
-	spin_lock_irqsave(&from->move_lock, flags);
+	lock_page_memcg(page);
 
 	if (!anon && page_mapped(page)) {
 		__mod_lruvec_state(from_vec, NR_FILE_MAPPED, -nr_pages);
 		__mod_lruvec_state(to_vec, NR_FILE_MAPPED, nr_pages);
 	}
 
-	/*
-	 * move_lock grabbed above and caller set from->moving_account, so
-	 * mod_memcg_page_state will serialize updates to PageDirty.
-	 * So mapping should be stable for dirty pages.
-	 */
 	if (!anon && PageDirty(page)) {
 		struct address_space *mapping = page_mapping(page);
 
@@ -5543,15 +5537,23 @@ static int mem_cgroup_move_account(struct page *page,
 	}
 
 	/*
+	 * All state has been migrated, let's switch to the new memcg.
+	 *
 	 * It is safe to change page->mem_cgroup here because the page
-	 * is referenced, charged, and isolated - we can't race with
-	 * uncharging, charging, migration, or LRU putback.
+	 * is referenced, charged, isolated, and locked: we can't race
+	 * with (un)charging, migration, LRU putback, or anything else
+	 * that would rely on a stable page->mem_cgroup.
+	 *
+	 * Note that lock_page_memcg is a memcg lock, not a page lock,
+	 * to save space. As soon as we switch page->mem_cgroup to a
+	 * new memcg that isn't locked, the above state can change
+	 * concurrently again. Make sure we're truly done with it.
 	 */
+	smp_mb();
 
-	/* caller should have done css_get */
-	page->mem_cgroup = to;
+	page->mem_cgroup = to; 	/* caller should have done css_get */
 
-	spin_unlock_irqrestore(&from->move_lock, flags);
+	__unlock_page_memcg(from);
 
 	ret = 0;