[to-be-updated] mm-dont-expose-page-to-fast-gup-before-its-ready.patch removed from -mm tree

akpm@xxxxxxxxxxxxxxxxxxxx · Fri, 20 Sep 2019 17:47:29 -0700

The patch titled
     Subject: mm: don't expose page to fast gup before it's ready
has been removed from the -mm tree.  Its filename was
     mm-dont-expose-page-to-fast-gup-before-its-ready.patch

This patch was dropped because an updated version will be merged

------------------------------------------------------
From: Yu Zhao <yuzhao@xxxxxxxxxx>
Subject: mm: don't expose page to fast gup before it's ready

We don't want to expose page before it's properly setup.  During page
setup, we may call page_add_new_anon_rmap() which uses non- atomic bit op.
If page is exposed before it's done, we could overwrite page flags that
are set by get_user_pages_fast() or its callers.  Here is a non-fatal
scenario (there might be other fatal problems that I didn't look into):

	CPU 1				CPU1
set_pte_at()			get_user_pages_fast()
page_add_new_anon_rmap()		gup_pte_range()
	__SetPageSwapBacked()			SetPageReferenced()

Fix the problem by delaying set_pte_at() until page is ready.

I didn't observe the race directly.  But I did get few crashes when
trying to access mem_cgroup of pages returned by get_user_pages_fast().
Those page were charged and they showed valid mem_cgroup in kdumps. 
So this led me to think the problem came from premature set_pte_at().

I think the fact that nobody complained about this problem is because
the race only happens when using ksm+swap, and it might not cause any
fatal problem even so.  Nevertheless, it's nice to have set_pte_at()
done consistently after rmap is added and page is charged.

Link: http://lkml.kernel.org/r/20180108225632.16332-1-yuzhao@xxxxxxxxxx
Signed-off-by: Yu Zhao <yuzhao@xxxxxxxxxx>
Cc: Jan Kara <jack@xxxxxxx>
Cc: Minchan Kim <minchan@xxxxxxxxxx>
Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
Cc: Vladimir Davydov <vdavydov.dev@xxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 mm/memory.c   |    2 +-
 mm/swapfile.c |    4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

--- a/mm/memory.c~mm-dont-expose-page-to-fast-gup-before-its-ready
+++ a/mm/memory.c
@@ -2884,7 +2884,6 @@ vm_fault_t do_swap_page(struct vm_fault
 	flush_icache_page(vma, page);
 	if (pte_swp_soft_dirty(vmf->orig_pte))
 		pte = pte_mksoft_dirty(pte);
-	set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte);
 	arch_do_swap_page(vma->vm_mm, vma, vmf->address, pte, vmf->orig_pte);
 	vmf->orig_pte = pte;
 
@@ -2898,6 +2897,7 @@ vm_fault_t do_swap_page(struct vm_fault
 		mem_cgroup_commit_charge(page, memcg, true, false);
 		activate_page(page);
 	}
+	set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte);
 
 	swap_free(entry);
 	if (mem_cgroup_swap_full(page) ||
--- a/mm/swapfile.c~mm-dont-expose-page-to-fast-gup-before-its-ready
+++ a/mm/swapfile.c
@@ -1880,8 +1880,6 @@ static int unuse_pte(struct vm_area_stru
 	dec_mm_counter(vma->vm_mm, MM_SWAPENTS);
 	inc_mm_counter(vma->vm_mm, MM_ANONPAGES);
 	get_page(page);
-	set_pte_at(vma->vm_mm, addr, pte,
-		   pte_mkold(mk_pte(page, vma->vm_page_prot)));
 	if (page == swapcache) {
 		page_add_anon_rmap(page, vma, addr, false);
 		mem_cgroup_commit_charge(page, memcg, true, false);
@@ -1890,6 +1888,8 @@ static int unuse_pte(struct vm_area_stru
 		mem_cgroup_commit_charge(page, memcg, false, false);
 		lru_cache_add_active_or_unevictable(page, vma);
 	}
+	set_pte_at(vma->vm_mm, addr, pte,
+		   pte_mkold(mk_pte(page, vma->vm_page_prot)));
 	swap_free(entry);
 	/*
 	 * Move the page to the active list so it is not
_

Patches currently in -mm which might be from yuzhao@xxxxxxxxxx are

mm-replace-list_move_tail-with-add_page_to_lru_list_tail.patch