Re: [patch NOT added to the 3.12 stable tree] mm: add !pte_present() check on existing hugetlb_entry callbacks

Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx> · Fri, 20 Jun 2014 10:57:20 -0400

Hi Jiri,

On Fri, Jun 20, 2014 at 09:13:32AM +0200, Jiri Slaby wrote:
> From: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx>
> 
> This patch does NOT apply to the 3.12 stable tree. If you still want
> it applied, please provide a backport.

At kernel 3.12 queue_pages_hugetlb_pmd_range() used vma->vm_mm->page_table_lock
for page table lock, which was introduced at commit e2d8cf40552 "migrate: add
hugepage migration code to migrate_pages()".
After 3.12 in mainline kernel we had the lock conversion into split pmd lock
at commit cb900f4121544 "mm, hugetlb: convert hugetlbfs to use split pmd lock".

Stable-3.12 doesn't have split pmd lock, so just adding pte_present check
with keeping vma->vm_mm->page_table_lock should be correct.

Here is the diff:

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 390bdab01c3c..ad4df869c907 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1353,7 +1353,7 @@ static int gather_hugetbl_stats(pte_t *pte, unsigned long hmask,
 	struct numa_maps *md;
 	struct page *page;
 
-	if (pte_none(*pte))
+	if (!pte_present(*pte))
 		return 0;
 
 	page = pte_page(*pte);
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 927a69cf354a..a005cc9f6f18 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -525,9 +525,13 @@ static void queue_pages_hugetlb_pmd_range(struct vm_area_struct *vma,
 #ifdef CONFIG_HUGETLB_PAGE
 	int nid;
 	struct page *page;
+	pte_t entry;
 
 	spin_lock(&vma->vm_mm->page_table_lock);
-	page = pte_page(huge_ptep_get((pte_t *)pmd));
+	entry = huge_ptep_get((pte_t *)pmd);
+	if (!pte_present(entry))
+		goto unlock;
+	page = pte_page(entry);
 	nid = page_to_nid(page);
 	if (node_isset(nid, *nodes) == !!(flags & MPOL_MF_INVERT))
 		goto unlock;
-- 

Thanks,
Naoya Horiguchi


> ===============
> 
> commit d4c54919ed86302094c0ca7d48a8cbd4ee753e92 upstream.
> 
> The age table walker doesn't check non-present hugetlb entry in common
> path, so hugetlb_entry() callbacks must check it.  The reason for this
> behavior is that some callers want to handle it in its own way.
> 
> [ I think that reason is bogus, btw - it should just do what the regular
>   code does, which is to call the "pte_hole()" function for such hugetlb
>   entries  - Linus]
> 
> However, some callers don't check it now, which causes unpredictable
> result, for example when we have a race between migrating hugepage and
> reading /proc/pid/numa_maps.  This patch fixes it by adding !pte_present
> checks on buggy callbacks.
> 
> This bug exists for years and got visible by introducing hugepage
> migration.
> 
> ChangeLog v2:
> - fix if condition (check !pte_present() instead of pte_present())
> 
> Reported-by: Sasha Levin <sasha.levin@xxxxxxxxxx>
> Signed-off-by: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx>
> Cc: Rik van Riel <riel@xxxxxxxxxx>
> Cc: <stable@xxxxxxxxxxxxxxx> [3.12+]
> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> [ Backported to 3.15.  Signed-off-by: Josh Boyer <jwboyer@xxxxxxxxxxxxxxxxx> ]
> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> ---
>  fs/proc/task_mmu.c | 2 +-
>  mm/mempolicy.c     | 6 +++++-
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 442177b1119a..c4b2646b6d7c 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -1351,7 +1351,7 @@ static int gather_hugetbl_stats(pte_t *pte, unsigned long hmask,
>  	struct numa_maps *md;
>  	struct page *page;
>  
> -	if (pte_none(*pte))
> +	if (!pte_present(*pte))
>  		return 0;
>  
>  	page = pte_page(*pte);
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 78e1472933ea..30cc47f8ffa0 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -526,9 +526,13 @@ static void queue_pages_hugetlb_pmd_range(struct vm_area_struct *vma,
>  	int nid;
>  	struct page *page;
>  	spinlock_t *ptl;
> +	pte_t entry;
>  
>  	ptl = huge_pte_lock(hstate_vma(vma), vma->vm_mm, (pte_t *)pmd);
> -	page = pte_page(huge_ptep_get((pte_t *)pmd));
> +	entry = huge_ptep_get((pte_t *)pmd);
> +	if (!pte_present(entry))
> +		goto unlock;
> +	page = pte_page(entry);
>  	nid = page_to_nid(page);
>  	if (node_isset(nid, *nodes) == !!(flags & MPOL_MF_INVERT))
>  		goto unlock;
> -- 
> 2.0.0
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html