On Wed, Mar 11, 2015 at 02:35:10PM +0100, gregkh@xxxxxxxxxxxxxxxxxxx wrote: > > The patch below does not apply to the 3.19-stable tree. > If someone wants it applied there, or to any other stable or longterm > tree, then please email the backport, including the original git commit > id to <stable@xxxxxxxxxxxxxxx>. > > thanks, > > greg k-h > > ------------------ original commit in Linus's tree ------------------ > > From e66f17ff71772b209eed39de35aaa99ba819c93d Mon Sep 17 00:00:00 2001 > From: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx> > Date: Wed, 11 Feb 2015 15:25:22 -0800 > Subject: [PATCH] mm/hugetlb: take page table lock in follow_huge_pmd() This patch depends on commit 61f77eda9bbf ("mm/hugetlb: reduce arch dependent code around follow_huge_*"), so we need backport this with it. Unfortunately, 61f77eda9bbf doesn't apply cleanly on v3.19.1 because of commit 25153a83b04f ("mm/hugetlb: take page table lock in follow_huge_pmd()"), which is already merged at v3.19.1 separately. IOW, these 3 patches should've been applied in the order of 61f77eda9bbf -> 25153a83b04f -> e66f17ff7177, but now only 25153a83b04f is in stable-3.19. So I attached modified versions of 61f77eda9bbf and e66f17ff7177, these should be applied onto v3.19.1 cleanly. Thanks, Naoya Horiguchi
From b0a4217b12c72697d7b48ac66c8547031ca373d7 Mon Sep 17 00:00:00 2001 From: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx> Date: Mon, 16 Mar 2015 16:41:08 +0900 Subject: [PATCH 1/2] mm/hugetlb: reduce arch dependent code around follow_huge_* Currently we have many duplicates in definitions around follow_huge_addr(), follow_huge_pmd(), and follow_huge_pud(), so this patch tries to remove the m. The basic idea is to put the default implementation for these functions in mm/hugetlb.c as weak symbols (regardless of CONFIG_ARCH_WANT_GENERAL_HUGETL B), and to implement arch-specific code only when the arch needs it. For follow_huge_addr(), only powerpc and ia64 have their own implementation, and in all other architectures this function just returns ERR_PTR(-EINVAL). So this patch sets returning ERR_PTR(-EINVAL) as default. As for follow_huge_(pmd|pud)(), if (pmd|pud)_huge() is implemented to always return 0 in your architecture (like in ia64 or sparc,) it's never called (the callsite is optimized away) no matter how implemented it is. So in such architectures, we don't need arch-specific implementation. In some architecture (like mips, s390 and tile,) their current arch-specific follow_huge_(pmd|pud)() are effectively identical with the common code, so this patch lets these architecture use the common code. One exception is metag, where pmd_huge() could return non-zero but it expects follow_huge_pmd() to always return NULL. This means that we need arch-specific implementation which returns NULL. This behavior looks strange to me (because non-zero pmd_huge() implies that the architecture supports PMD-based hugepage, so follow_huge_pmd() can/should return some relevant value,) but that's beyond this cleanup patch, so let's keep it. Justification of non-trivial changes: - in s390, follow_huge_pmd() checks !MACHINE_HAS_HPAGE at first, and this patch removes the check. This is OK because we can assume MACHINE_HAS_HPAGE is true when follow_huge_pmd() can be called (note that pmd_huge() has the same check and always returns 0 for !MACHINE_HAS_HPAGE.) - in s390 and mips, we use HPAGE_MASK instead of PMD_MASK as done in common code. This patch forces these archs use PMD_MASK, but it's OK because they are identical in both archs. In s390, both of HPAGE_SHIFT and PMD_SHIFT are 20. In mips, HPAGE_SHIFT is defined as (PAGE_SHIFT + PAGE_SHIFT - 3) and PMD_SHIFT is define as (PAGE_SHIFT + PAGE_SHIFT + PTE_ORDER - 3), but PTE_ORDER is always 0, so these are identical. [n-horiguchi@xxxxxxxxxxxxx: resolve conflict to apply to v3.19.1] Signed-off-by: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx> Acked-by: Hugh Dickins <hughd@xxxxxxxxxx> Cc: James Hogan <james.hogan@xxxxxxxxxx> Cc: David Rientjes <rientjes@xxxxxxxxxx> Cc: Mel Gorman <mel@xxxxxxxxx> Cc: Johannes Weiner <hannes@xxxxxxxxxxx> Cc: Michal Hocko <mhocko@xxxxxxx> Cc: Rik van Riel <riel@xxxxxxxxxx> Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx> Cc: Luiz Capitulino <lcapitulino@xxxxxxxxxx> Cc: Nishanth Aravamudan <nacc@xxxxxxxxxxxxxxxxxx> Cc: Lee Schermerhorn <lee.schermerhorn@xxxxxx> Cc: Steve Capper <steve.capper@xxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> --- arch/arm/mm/hugetlbpage.c | 6 ------ arch/arm64/mm/hugetlbpage.c | 6 ------ arch/ia64/mm/hugetlbpage.c | 6 ------ arch/metag/mm/hugetlbpage.c | 6 ------ arch/mips/mm/hugetlbpage.c | 18 ------------------ arch/powerpc/mm/hugetlbpage.c | 8 ++++++++ arch/s390/mm/hugetlbpage.c | 20 -------------------- arch/sh/mm/hugetlbpage.c | 12 ------------ arch/sparc/mm/hugetlbpage.c | 12 ------------ arch/tile/mm/hugetlbpage.c | 28 ---------------------------- arch/x86/mm/hugetlbpage.c | 12 ------------ mm/hugetlb.c | 30 +++++++++++++++--------------- 12 files changed, 23 insertions(+), 141 deletions(-) diff --git a/arch/arm/mm/hugetlbpage.c b/arch/arm/mm/hugetlbpage.c index 66781bf34077..c72412415093 100644 --- a/arch/arm/mm/hugetlbpage.c +++ b/arch/arm/mm/hugetlbpage.c @@ -36,12 +36,6 @@ * of type casting from pmd_t * to pte_t *. */ -struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address, - int write) -{ - return ERR_PTR(-EINVAL); -} - int pud_huge(pud_t pud) { return 0; diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 023747bf4dd7..2de9d2e59d96 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -38,12 +38,6 @@ int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep) } #endif -struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address, - int write) -{ - return ERR_PTR(-EINVAL); -} - int pmd_huge(pmd_t pmd) { return !(pmd_val(pmd) & PMD_TABLE_BIT); diff --git a/arch/ia64/mm/hugetlbpage.c b/arch/ia64/mm/hugetlbpage.c index 76069c18ee42..52b7604b5215 100644 --- a/arch/ia64/mm/hugetlbpage.c +++ b/arch/ia64/mm/hugetlbpage.c @@ -114,12 +114,6 @@ int pud_huge(pud_t pud) return 0; } -struct page * -follow_huge_pmd(struct mm_struct *mm, unsigned long address, pmd_t *pmd, int write) -{ - return NULL; -} - void hugetlb_free_pgd_range(struct mmu_gather *tlb, unsigned long addr, unsigned long end, unsigned long floor, unsigned long ceiling) diff --git a/arch/metag/mm/hugetlbpage.c b/arch/metag/mm/hugetlbpage.c index 3c32075d2945..7ca80ac42ed5 100644 --- a/arch/metag/mm/hugetlbpage.c +++ b/arch/metag/mm/hugetlbpage.c @@ -94,12 +94,6 @@ int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep) return 0; } -struct page *follow_huge_addr(struct mm_struct *mm, - unsigned long address, int write) -{ - return ERR_PTR(-EINVAL); -} - int pmd_huge(pmd_t pmd) { return pmd_page_shift(pmd) > PAGE_SHIFT; diff --git a/arch/mips/mm/hugetlbpage.c b/arch/mips/mm/hugetlbpage.c index 4ec8ee10d371..06e0f421b41b 100644 --- a/arch/mips/mm/hugetlbpage.c +++ b/arch/mips/mm/hugetlbpage.c @@ -68,12 +68,6 @@ int is_aligned_hugepage_range(unsigned long addr, unsigned long len) return 0; } -struct page * -follow_huge_addr(struct mm_struct *mm, unsigned long address, int write) -{ - return ERR_PTR(-EINVAL); -} - int pmd_huge(pmd_t pmd) { return (pmd_val(pmd) & _PAGE_HUGE) != 0; @@ -83,15 +77,3 @@ int pud_huge(pud_t pud) { return (pud_val(pud) & _PAGE_HUGE) != 0; } - -struct page * -follow_huge_pmd(struct mm_struct *mm, unsigned long address, - pmd_t *pmd, int write) -{ - struct page *page; - - page = pte_page(*(pte_t *)pmd); - if (page) - page += ((address & ~HPAGE_MASK) >> PAGE_SHIFT); - return page; -} diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c index 620d0ec93e6f..7e408bfc7948 100644 --- a/arch/powerpc/mm/hugetlbpage.c +++ b/arch/powerpc/mm/hugetlbpage.c @@ -714,6 +714,14 @@ follow_huge_pmd(struct mm_struct *mm, unsigned long address, return NULL; } +struct page * +follow_huge_pud(struct mm_struct *mm, unsigned long address, + pud_t *pud, int write) +{ + BUG(); + return NULL; +} + static unsigned long hugepte_addr_end(unsigned long addr, unsigned long end, unsigned long sz) { diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c index 3c80d2e38f03..210ffede0153 100644 --- a/arch/s390/mm/hugetlbpage.c +++ b/arch/s390/mm/hugetlbpage.c @@ -192,12 +192,6 @@ int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep) return 0; } -struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address, - int write) -{ - return ERR_PTR(-EINVAL); -} - int pmd_huge(pmd_t pmd) { if (!MACHINE_HAS_HPAGE) @@ -210,17 +204,3 @@ int pud_huge(pud_t pud) { return 0; } - -struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address, - pmd_t *pmdp, int write) -{ - struct page *page; - - if (!MACHINE_HAS_HPAGE) - return NULL; - - page = pmd_page(*pmdp); - if (page) - page += ((address & ~HPAGE_MASK) >> PAGE_SHIFT); - return page; -} diff --git a/arch/sh/mm/hugetlbpage.c b/arch/sh/mm/hugetlbpage.c index d7762349ea48..534bc978af8a 100644 --- a/arch/sh/mm/hugetlbpage.c +++ b/arch/sh/mm/hugetlbpage.c @@ -67,12 +67,6 @@ int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep) return 0; } -struct page *follow_huge_addr(struct mm_struct *mm, - unsigned long address, int write) -{ - return ERR_PTR(-EINVAL); -} - int pmd_huge(pmd_t pmd) { return 0; @@ -82,9 +76,3 @@ int pud_huge(pud_t pud) { return 0; } - -struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address, - pmd_t *pmd, int write) -{ - return NULL; -} diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c index d329537739c6..4242eab12e10 100644 --- a/arch/sparc/mm/hugetlbpage.c +++ b/arch/sparc/mm/hugetlbpage.c @@ -215,12 +215,6 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, return entry; } -struct page *follow_huge_addr(struct mm_struct *mm, - unsigned long address, int write) -{ - return ERR_PTR(-EINVAL); -} - int pmd_huge(pmd_t pmd) { return 0; @@ -230,9 +224,3 @@ int pud_huge(pud_t pud) { return 0; } - -struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address, - pmd_t *pmd, int write) -{ - return NULL; -} diff --git a/arch/tile/mm/hugetlbpage.c b/arch/tile/mm/hugetlbpage.c index 3270e0019266..8416240c322c 100644 --- a/arch/tile/mm/hugetlbpage.c +++ b/arch/tile/mm/hugetlbpage.c @@ -150,12 +150,6 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr) return NULL; } -struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address, - int write) -{ - return ERR_PTR(-EINVAL); -} - int pmd_huge(pmd_t pmd) { return !!(pmd_val(pmd) & _PAGE_HUGE_PAGE); @@ -166,28 +160,6 @@ int pud_huge(pud_t pud) return !!(pud_val(pud) & _PAGE_HUGE_PAGE); } -struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address, - pmd_t *pmd, int write) -{ - struct page *page; - - page = pte_page(*(pte_t *)pmd); - if (page) - page += ((address & ~PMD_MASK) >> PAGE_SHIFT); - return page; -} - -struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address, - pud_t *pud, int write) -{ - struct page *page; - - page = pte_page(*(pte_t *)pud); - if (page) - page += ((address & ~PUD_MASK) >> PAGE_SHIFT); - return page; -} - int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep) { return 0; diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c index 006cc914994b..9161f764121e 100644 --- a/arch/x86/mm/hugetlbpage.c +++ b/arch/x86/mm/hugetlbpage.c @@ -52,20 +52,8 @@ int pud_huge(pud_t pud) return 0; } -struct page * -follow_huge_pmd(struct mm_struct *mm, unsigned long address, - pmd_t *pmd, int write) -{ - return NULL; -} #else -struct page * -follow_huge_addr(struct mm_struct *mm, unsigned long address, int write) -{ - return ERR_PTR(-EINVAL); -} - /* * pmd_huge() returns 1 if @pmd is hugetlb related entry, that is normal * hugetlb entry or non-present (migration or hwpoisoned) hugetlb entry. diff --git a/mm/hugetlb.c b/mm/hugetlb.c index c49586f40758..dd42878549d5 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3660,7 +3660,20 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr) return (pte_t *) pmd; } -struct page * +#endif /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */ + +/* + * These functions are overwritable if your architecture needs its own + * behavior. + */ +struct page * __weak +follow_huge_addr(struct mm_struct *mm, unsigned long address, + int write) +{ + return ERR_PTR(-EINVAL); +} + +struct page * __weak follow_huge_pmd(struct mm_struct *mm, unsigned long address, pmd_t *pmd, int write) { @@ -3674,7 +3687,7 @@ follow_huge_pmd(struct mm_struct *mm, unsigned long address, return page; } -struct page * +struct page * __weak follow_huge_pud(struct mm_struct *mm, unsigned long address, pud_t *pud, int write) { @@ -3686,19 +3699,6 @@ follow_huge_pud(struct mm_struct *mm, unsigned long address, return page; } -#else /* !CONFIG_ARCH_WANT_GENERAL_HUGETLB */ - -/* Can be overriden by architectures */ -struct page * __weak -follow_huge_pud(struct mm_struct *mm, unsigned long address, - pud_t *pud, int write) -{ - BUG(); - return NULL; -} - -#endif /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */ - #ifdef CONFIG_MEMORY_FAILURE /* Should be called in hugetlb_lock */ -- 1.9.3
From 25153a83b04f6662618d8a7fa5e028a94288873f Mon Sep 17 00:00:00 2001 From: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx> Date: Mon, 16 Mar 2015 16:42:43 +0900 Subject: [PATCH 2/2] mm/hugetlb: take page table lock in follow_huge_pmd() We have a race condition between move_pages() and freeing hugepages, where move_pages() calls follow_page(FOLL_GET) for hugepages internally and tries to get its refcount without preventing concurrent freeing. This race crashes the kernel, so this patch fixes it by moving FOLL_GET code for hugepages into follow_huge_pmd() with taking the page table lock. This patch intentionally removes page==NULL check after pte_page. This is justified because pte_page() never returns NULL for any architectures or configurations. This patch changes the behavior of follow_huge_pmd() for tail pages and then tail pages can be pinned/returned. So the caller must be changed to properly handle the returned tail pages. We could have a choice to add the similar locking to follow_huge_(addr|pud) for consistency, but it's not necessary because currently these functions don't support FOLL_GET flag, so let's leave it for future development. Here is the reproducer: $ cat movepages.c #include <stdio.h> #include <stdlib.h> #include <numaif.h> #define ADDR_INPUT 0x700000000000UL #define HPS 0x200000 #define PS 0x1000 int main(int argc, char *argv[]) { int i; int nr_hp = strtol(argv[1], NULL, 0); int nr_p = nr_hp * HPS / PS; int ret; void **addrs; int *status; int *nodes; pid_t pid; pid = strtol(argv[2], NULL, 0); addrs = malloc(sizeof(char *) * nr_p + 1); status = malloc(sizeof(char *) * nr_p + 1); nodes = malloc(sizeof(char *) * nr_p + 1); while (1) { for (i = 0; i < nr_p; i++) { addrs[i] = (void *)ADDR_INPUT + i * PS; nodes[i] = 1; status[i] = 0; } ret = numa_move_pages(pid, nr_p, addrs, nodes, status, MPOL_MF_MOVE_ALL); if (ret == -1) err("move_pages"); for (i = 0; i < nr_p; i++) { addrs[i] = (void *)ADDR_INPUT + i * PS; nodes[i] = 0; status[i] = 0; } ret = numa_move_pages(pid, nr_p, addrs, nodes, status, MPOL_MF_MOVE_ALL); if (ret == -1) err("move_pages"); } return 0; } $ cat hugepage.c #include <stdio.h> #include <sys/mman.h> #include <string.h> #define ADDR_INPUT 0x700000000000UL #define HPS 0x200000 int main(int argc, char *argv[]) { int nr_hp = strtol(argv[1], NULL, 0); char *p; while (1) { p = mmap((void *)ADDR_INPUT, nr_hp * HPS, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0); if (p != (void *)ADDR_INPUT) { perror("mmap"); break; } memset(p, 0, nr_hp * HPS); munmap(p, nr_hp * HPS); } } $ sysctl vm.nr_hugepages=40 $ ./hugepage 10 & $ ./movepages 10 $(pgrep -f hugepage) [n-horiguchi@xxxxxxxxxxxxx: resolve conflict to apply to v3.19.1] Fixes: e632a938d914 ("mm: migrate: add hugepage migration code to move_pages()") Signed-off-by: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx> Reported-by: Hugh Dickins <hughd@xxxxxxxxxx> Cc: James Hogan <james.hogan@xxxxxxxxxx> Cc: David Rientjes <rientjes@xxxxxxxxxx> Cc: Mel Gorman <mel@xxxxxxxxx> Cc: Johannes Weiner <hannes@xxxxxxxxxxx> Cc: Michal Hocko <mhocko@xxxxxxx> Cc: Rik van Riel <riel@xxxxxxxxxx> Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx> Cc: Luiz Capitulino <lcapitulino@xxxxxxxxxx> Cc: Nishanth Aravamudan <nacc@xxxxxxxxxxxxxxxxxx> Cc: Lee Schermerhorn <lee.schermerhorn@xxxxxx> Cc: Steve Capper <steve.capper@xxxxxxxxxx> Cc: <stable@xxxxxxxxxxxxxxx> [3.12+] Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> --- include/linux/hugetlb.h | 8 ++++---- include/linux/swapops.h | 4 ++++ mm/gup.c | 25 ++++++++----------------- mm/hugetlb.c | 48 ++++++++++++++++++++++++++++++++++-------------- mm/migrate.c | 5 +++-- 5 files changed, 53 insertions(+), 37 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 431b7fc605c9..e235ec5f1f28 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -99,9 +99,9 @@ int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep); struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address, int write); struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address, - pmd_t *pmd, int write); + pmd_t *pmd, int flags); struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address, - pud_t *pud, int write); + pud_t *pud, int flags); int pmd_huge(pmd_t pmd); int pud_huge(pud_t pmd); unsigned long hugetlb_change_protection(struct vm_area_struct *vma, @@ -133,8 +133,8 @@ static inline void hugetlb_report_meminfo(struct seq_file *m) static inline void hugetlb_show_meminfo(void) { } -#define follow_huge_pmd(mm, addr, pmd, write) NULL -#define follow_huge_pud(mm, addr, pud, write) NULL +#define follow_huge_pmd(mm, addr, pmd, flags) NULL +#define follow_huge_pud(mm, addr, pud, flags) NULL #define prepare_hugepage_range(file, addr, len) (-EINVAL) #define pmd_huge(x) 0 #define pud_huge(x) 0 diff --git a/include/linux/swapops.h b/include/linux/swapops.h index 6adfb7bfbf44..e288d5c016a7 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -137,6 +137,8 @@ static inline void make_migration_entry_read(swp_entry_t *entry) *entry = swp_entry(SWP_MIGRATION_READ, swp_offset(*entry)); } +extern void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep, + spinlock_t *ptl); extern void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, unsigned long address); extern void migration_entry_wait_huge(struct vm_area_struct *vma, @@ -150,6 +152,8 @@ static inline int is_migration_entry(swp_entry_t swp) } #define migration_entry_to_page(swp) NULL static inline void make_migration_entry_read(swp_entry_t *entryp) { } +static inline void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep, + spinlock_t *ptl) { } static inline void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, unsigned long address) { } static inline void migration_entry_wait_huge(struct vm_area_struct *vma, diff --git a/mm/gup.c b/mm/gup.c index 9b2afbfe67e3..e29c3745a893 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -167,10 +167,10 @@ struct page *follow_page_mask(struct vm_area_struct *vma, if (pud_none(*pud)) return no_page_table(vma, flags); if (pud_huge(*pud) && vma->vm_flags & VM_HUGETLB) { - if (flags & FOLL_GET) - return NULL; - page = follow_huge_pud(mm, address, pud, flags & FOLL_WRITE); - return page; + page = follow_huge_pud(mm, address, pud, flags); + if (page) + return page; + return no_page_table(vma, flags); } if (unlikely(pud_bad(*pud))) return no_page_table(vma, flags); @@ -179,19 +179,10 @@ struct page *follow_page_mask(struct vm_area_struct *vma, if (pmd_none(*pmd)) return no_page_table(vma, flags); if (pmd_huge(*pmd) && vma->vm_flags & VM_HUGETLB) { - page = follow_huge_pmd(mm, address, pmd, flags & FOLL_WRITE); - if (flags & FOLL_GET) { - /* - * Refcount on tail pages are not well-defined and - * shouldn't be taken. The caller should handle a NULL - * return when trying to follow tail pages. - */ - if (PageHead(page)) - get_page(page); - else - page = NULL; - } - return page; + page = follow_huge_pmd(mm, address, pmd, flags); + if (page) + return page; + return no_page_table(vma, flags); } if ((flags & FOLL_NUMA) && pmd_numa(*pmd)) return no_page_table(vma, flags); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index dd42878549d5..adafced1aa17 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3675,28 +3675,48 @@ follow_huge_addr(struct mm_struct *mm, unsigned long address, struct page * __weak follow_huge_pmd(struct mm_struct *mm, unsigned long address, - pmd_t *pmd, int write) + pmd_t *pmd, int flags) { - struct page *page; - - if (!pmd_present(*pmd)) - return NULL; - page = pte_page(*(pte_t *)pmd); - if (page) - page += ((address & ~PMD_MASK) >> PAGE_SHIFT); + struct page *page = NULL; + spinlock_t *ptl; +retry: + ptl = pmd_lockptr(mm, pmd); + spin_lock(ptl); + /* + * make sure that the address range covered by this pmd is not + * unmapped from other threads. + */ + if (!pmd_huge(*pmd)) + goto out; + if (pmd_present(*pmd)) { + page = pte_page(*(pte_t *)pmd) + + ((address & ~PMD_MASK) >> PAGE_SHIFT); + if (flags & FOLL_GET) + get_page(page); + } else { + if (is_hugetlb_entry_migration(huge_ptep_get((pte_t *)pmd))) { + spin_unlock(ptl); + __migration_entry_wait(mm, (pte_t *)pmd, ptl); + goto retry; + } + /* + * hwpoisoned entry is treated as no_page_table in + * follow_page_mask(). + */ + } +out: + spin_unlock(ptl); return page; } struct page * __weak follow_huge_pud(struct mm_struct *mm, unsigned long address, - pud_t *pud, int write) + pud_t *pud, int flags) { - struct page *page; + if (flags & FOLL_GET) + return NULL; - page = pte_page(*(pte_t *)pud); - if (page) - page += ((address & ~PUD_MASK) >> PAGE_SHIFT); - return page; + return pte_page(*(pte_t *)pud) + ((address & ~PUD_MASK) >> PAGE_SHIFT); } #ifdef CONFIG_MEMORY_FAILURE diff --git a/mm/migrate.c b/mm/migrate.c index 344cdf692fc8..be6d1edcfcb7 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -229,7 +229,7 @@ static void remove_migration_ptes(struct page *old, struct page *new) * get to the page and wait until migration is finished. * When we return from this function the fault will be retried. */ -static void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep, +void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep, spinlock_t *ptl) { pte_t pte; @@ -1268,7 +1268,8 @@ static int do_move_page_to_node_array(struct mm_struct *mm, goto put_and_set; if (PageHuge(page)) { - isolate_huge_page(page, &pagelist); + if (PageHead(page)) + isolate_huge_page(page, &pagelist); goto put_and_set; } -- 1.9.3