Re: FAILED: patch "[PATCH] proc/pagemap: walk page tables under pte lock" failed to apply to 3.10-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Feb 28, 2015 at 02:45:34PM -0800, gregkh@xxxxxxxxxxxxxxxxxxx wrote:
> 
> The patch below does not apply to the 3.10-stable tree.
> If someone wants it applied there, or to any other stable or longterm
> tree, then please email the backport, including the original git commit
> id to <stable@xxxxxxxxxxxxxxx>.

Basically a major reason of the conflict is the lack of the following patch:

  commit 81d0fa623c5b8dbd5279d9713094b0f9b0a00fb4
  Author: Peter Feiner <pfeiner@xxxxxxxxxx>
  Date:   Thu Oct 9 15:28:32 2014 -0700
  
      mm: softdirty: unmapped addresses between VMAs are clean

but unlike for v3.14, v3.10 doesn't have soft_dirty or split pmd lock, so
we need some manual resolution to apply 81d0fa623c5b and 05fbf357d941 on
top of v3.10.70.

I think the resolution might be mainly cosmetic, but I'm not 100% sure that
the altered patches (attached) introduce no unstablity.

Peter, Konstantin, could you check attached ones and find out whether these
patches work fine on v3.10.70?  Or if there is any good reason to skip this
backporting safely, that's fine.

Thanks,
Naoya Horiguchi

> thanks,
> 
> greg k-h
> 
> ------------------ original commit in Linus's tree ------------------
> 
> From 05fbf357d94152171bc50f8a369390f1f16efd89 Mon Sep 17 00:00:00 2001
> From: Konstantin Khlebnikov <khlebnikov@xxxxxxxxxxxxxx>
> Date: Wed, 11 Feb 2015 15:27:31 -0800
> Subject: [PATCH] proc/pagemap: walk page tables under pte lock
> 
> Lockless access to pte in pagemap_pte_range() might race with page
> migration and trigger BUG_ON(!PageLocked()) in migration_entry_to_page():
> 
> CPU A (pagemap)                           CPU B (migration)
>                                           lock_page()
>                                           try_to_unmap(page, TTU_MIGRATION...)
>                                                make_migration_entry()
>                                                set_pte_at()
> <read *pte>
> pte_to_pagemap_entry()
>                                           remove_migration_ptes()
>                                           unlock_page()
>     if(is_migration_entry())
>         migration_entry_to_page()
>             BUG_ON(!PageLocked(page))
> 
> Also lockless read might be non-atomic if pte is larger than wordsize.
> Other pte walkers (smaps, numa_maps, clear_refs) already lock ptes.
> 
> Fixes: 052fb0d635df ("proc: report file/anon bit in /proc/pid/pagemap")
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@xxxxxxxxxxxxxx>
> Reported-by: Andrey Ryabinin <a.ryabinin@xxxxxxxxxxx>
> Reviewed-by: Cyrill Gorcunov <gorcunov@xxxxxxxxxx>
> Acked-by: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx>
> Acked-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
> Cc: <stable@xxxxxxxxxxxxxxx>	[3.5+]
> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> 
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index e6e0abeb5d12..eeab30fcffcc 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -1056,7 +1056,7 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>  	struct vm_area_struct *vma;
>  	struct pagemapread *pm = walk->private;
>  	spinlock_t *ptl;
> -	pte_t *pte;
> +	pte_t *pte, *orig_pte;
>  	int err = 0;
>  
>  	/* find the first VMA at or above 'addr' */
> @@ -1117,15 +1117,19 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>  		BUG_ON(is_vm_hugetlb_page(vma));
>  
>  		/* Addresses in the VMA. */
> -		for (; addr < min(end, vma->vm_end); addr += PAGE_SIZE) {
> +		orig_pte = pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
> +		for (; addr < min(end, vma->vm_end); pte++, addr += PAGE_SIZE) {
>  			pagemap_entry_t pme;
> -			pte = pte_offset_map(pmd, addr);
> +
>  			pte_to_pagemap_entry(&pme, pm, vma, addr, *pte);
> -			pte_unmap(pte);
>  			err = add_to_pagemap(addr, &pme, pm);
>  			if (err)
> -				return err;
> +				break;
>  		}
> +		pte_unmap_unlock(orig_pte, ptl);
> +
> +		if (err)
> +			return err;
>  
>  		if (addr == end)
>  			break;
> 
From f2ca592ed5e4a915b8e4cdc9c2b02d906ccf3c89 Mon Sep 17 00:00:00 2001
From: Peter Feiner <pfeiner@xxxxxxxxxx>
Date: Mon, 2 Mar 2015 14:18:04 +0900
Subject: [PATCH 1/2] mm: softdirty: unmapped addresses between VMAs are clean

If a /proc/pid/pagemap read spans a [VMA, an unmapped region, then a
VM_SOFTDIRTY VMA], the virtual pages in the unmapped region are reported
as softdirty.  Here's a program to demonstrate the bug:

int main() {
	const uint64_t PAGEMAP_SOFTDIRTY = 1ul << 55;
	uint64_t pme[3];
	int fd = open("/proc/self/pagemap", O_RDONLY);;
	char *m = mmap(NULL, 3 * getpagesize(), PROT_READ,
	               MAP_ANONYMOUS | MAP_SHARED, -1, 0);
	munmap(m + getpagesize(), getpagesize());
	pread(fd, pme, 24, (unsigned long) m / getpagesize() * 8);
	assert(pme[0] & PAGEMAP_SOFTDIRTY);    /* passes */
	assert(!(pme[1] & PAGEMAP_SOFTDIRTY)); /* fails */
	assert(pme[2] & PAGEMAP_SOFTDIRTY);    /* passes */
	return 0;
}

(Note that all pages in new VMAs are softdirty until cleared).

Tested:
	Used the program given above. I'm going to include this code in
	a selftest in the future.

[n-horiguchi@xxxxxxxxxxxxx: backporting for v3.10.70]
[n-horiguchi@xxxxxxxxxxxxx: prevent pagemap_pte_range() from overrunning]
Signed-off-by: Peter Feiner <pfeiner@xxxxxxxxxx>
Cc: "Kirill A. Shutemov" <kirill@xxxxxxxxxxxxx>
Cc: Cyrill Gorcunov <gorcunov@xxxxxxxxxx>
Cc: Pavel Emelyanov <xemul@xxxxxxxxxxxxx>
Cc: Jamie Liu <jamieliu@xxxxxxxxxx>
Cc: Hugh Dickins <hughd@xxxxxxxxxx>
Cc: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx>
Signed-off-by: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
---
 fs/proc/task_mmu.c | 54 +++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 39 insertions(+), 15 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 65fc60a07c47..fa7b4aaeb9dd 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -905,13 +905,13 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 	struct pagemapread *pm = walk->private;
 	pte_t *pte;
 	int err = 0;
-	pagemap_entry_t pme = make_pme(PM_NOT_PRESENT);
 
 	/* find the first VMA at or above 'addr' */
 	vma = find_vma(walk->mm, addr);
 	if (vma && pmd_trans_huge_lock(pmd, vma) == 1) {
 		for (; addr != end; addr += PAGE_SIZE) {
 			unsigned long offset;
+			pagemap_entry_t pme;
 
 			offset = (addr & ~PAGEMAP_WALK_MASK) >>
 					PAGE_SHIFT;
@@ -926,27 +926,51 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 
 	if (pmd_trans_unstable(pmd))
 		return 0;
-	for (; addr != end; addr += PAGE_SIZE) {
 
-		/* check to see if we've left 'vma' behind
-		 * and need a new, higher one */
-		if (vma && (addr >= vma->vm_end)) {
-			vma = find_vma(walk->mm, addr);
-			pme = make_pme(PM_NOT_PRESENT);
+	while (1) {
+		/* End of address space hole, which we mark as non-present. */
+		unsigned long hole_end;
+
+		if (vma)
+			hole_end = min(end, vma->vm_start);
+		else
+			hole_end = end;
+
+		for (; addr < hole_end; addr += PAGE_SIZE) {
+			pagemap_entry_t pme = make_pme(PM_NOT_PRESENT);
+
+			err = add_to_pagemap(addr, &pme, pm);
+			if (err)
+				return err;
 		}
 
-		/* check that 'vma' actually covers this address,
-		 * and that it isn't a huge page vma */
-		if (vma && (vma->vm_start <= addr) &&
-		    !is_vm_hugetlb_page(vma)) {
+		if (!vma || vma->vm_start >= end)
+			break;
+		/*
+		 * We can't possibly be in a hugetlb VMA. In general,
+		 * for a mm_walk with a pmd_entry and a hugetlb_entry,
+		 * the pmd_entry can only be called on addresses in a
+		 * hugetlb if the walk starts in a non-hugetlb VMA and
+		 * spans a hugepage VMA. Since pagemap_read walks are
+		 * PMD-sized and PMD-aligned, this will never be true.
+		 */
+		BUG_ON(is_vm_hugetlb_page(vma));
+
+		/* Addresses in the VMA. */
+		for (; addr < min(end, vma->vm_end); addr += PAGE_SIZE) {
+			pagemap_entry_t pme;
 			pte = pte_offset_map(pmd, addr);
 			pte_to_pagemap_entry(&pme, vma, addr, *pte);
-			/* unmap before userspace copy */
 			pte_unmap(pte);
+			err = add_to_pagemap(addr, &pme, pm);
+			if (err)
+				return err;
 		}
-		err = add_to_pagemap(addr, &pme, pm);
-		if (err)
-			return err;
+
+		if (addr == end)
+			break;
+
+		vma = find_vma(walk->mm, addr);
 	}
 
 	cond_resched();
-- 
1.9.3

From 7be09125c8414101341792b81cc07bb5aa4429a8 Mon Sep 17 00:00:00 2001
From: Konstantin Khlebnikov <khlebnikov@xxxxxxxxxxxxxx>
Date: Mon, 2 Mar 2015 14:29:22 +0900
Subject: [PATCH 2/2] proc/pagemap: walk page tables under pte lock

Lockless access to pte in pagemap_pte_range() might race with page
migration and trigger BUG_ON(!PageLocked()) in migration_entry_to_page():

CPU A (pagemap)                           CPU B (migration)
                                          lock_page()
                                          try_to_unmap(page, TTU_MIGRATION...)
                                               make_migration_entry()
                                               set_pte_at()
<read *pte>
pte_to_pagemap_entry()
                                          remove_migration_ptes()
                                          unlock_page()
    if(is_migration_entry())
        migration_entry_to_page()
            BUG_ON(!PageLocked(page))

Also lockless read might be non-atomic if pte is larger than wordsize.
Other pte walkers (smaps, numa_maps, clear_refs) already lock ptes.

[n-horiguchi@xxxxxxxxxxxxx: backporting for v3.10.70]
Fixes: 052fb0d635df ("proc: report file/anon bit in /proc/pid/pagemap")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@xxxxxxxxxxxxxx>
Reported-by: Andrey Ryabinin <a.ryabinin@xxxxxxxxxxx>
Reviewed-by: Cyrill Gorcunov <gorcunov@xxxxxxxxxx>
Acked-by: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx>
Acked-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
Cc: <stable@xxxxxxxxxxxxxxx>	[3.5+]
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
---
 fs/proc/task_mmu.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index fa7b4aaeb9dd..500dda9a14f0 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -903,7 +903,8 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 {
 	struct vm_area_struct *vma;
 	struct pagemapread *pm = walk->private;
-	pte_t *pte;
+	spinlock_t *ptl;
+	pte_t *pte, *orig_pte;
 	int err = 0;
 
 	/* find the first VMA at or above 'addr' */
@@ -957,15 +958,19 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 		BUG_ON(is_vm_hugetlb_page(vma));
 
 		/* Addresses in the VMA. */
-		for (; addr < min(end, vma->vm_end); addr += PAGE_SIZE) {
+		orig_pte = pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
+		for (; addr < min(end, vma->vm_end); pte++, addr += PAGE_SIZE) {
 			pagemap_entry_t pme;
-			pte = pte_offset_map(pmd, addr);
+
 			pte_to_pagemap_entry(&pme, vma, addr, *pte);
-			pte_unmap(pte);
 			err = add_to_pagemap(addr, &pme, pm);
 			if (err)
-				return err;
+				break;
 		}
+		pte_unmap_unlock(orig_pte, ptl);
+
+		if (err)
+			return err;
 
 		if (addr == end)
 			break;
-- 
1.9.3


[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]