On Mon, Mar 07, 2016 at 02:57:18PM +0300, Kirill A. Shutemov wrote: > freeze_page() and unfreeze_page() helpers evolved in rather complex > beasts. It would be nice to cut complexity of this code. > > This patch rewrites freeze_page() using standard try_to_unmap(). > unfreeze_page() is rewritten with remove_migration_ptes(). > > The result is much simpler. > > But the new variant is somewhat slower for PTE-mapped THPs. > Current helpers iterates over VMAs the compound page is mapped to, and > then over ptes within this VMA. New helpers iterates over small page, > then over VMA the small page mapped to, and only then find relevant pte. > > We have short cut for PMD-mapped THP: we directly install migration > entries on PMD split. > > I don't think the slowdown is critical, considering how much simpler > result is and that split_huge_page() is quite rare nowadays. It only > happens due memory pressure or migration. > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> > --- > include/linux/huge_mm.h | 10 ++- > mm/huge_memory.c | 201 +++++++----------------------------------------- > mm/rmap.c | 9 ++- > 3 files changed, 40 insertions(+), 180 deletions(-) Andrew, could you please fold patch below into this one. >From 01bd093fa5022cdc3e9a358f5f3787c47fb64108 Mon Sep 17 00:00:00 2001 From: "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx> Date: Fri, 11 Mar 2016 01:53:33 +0300 Subject: [PATCH] thp: make sure we replace the right page with migration entires split_huge_pmd_address(freeze == true) would split the pmd and setup migration entries into ptes. To get it work properly we need a page to check pmd against. Otherwise we can end up replacing wrong page (i.e. after COW break). Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> --- include/linux/huge_mm.h | 4 ++-- mm/huge_memory.c | 17 +++++++++++++---- mm/rmap.c | 3 ++- 3 files changed, 17 insertions(+), 7 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index d257d27cdcbc..79b0ef6aaa14 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -109,7 +109,7 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address, - bool freeze); + bool freeze, struct page *page); extern int hugepage_madvise(struct vm_area_struct *vma, unsigned long *vm_flags, int advice); @@ -177,7 +177,7 @@ static inline void deferred_split_huge_page(struct page *page) {} do { } while (0) static inline void split_huge_pmd_address(struct vm_area_struct *vma, - unsigned long address, bool freeze) {} + unsigned long address, bool freeze, struct page *page) {} static inline int hugepage_madvise(struct vm_area_struct *vma, unsigned long *vm_flags, int advice) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 8399b2e48d43..2787cd032b0e 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3085,7 +3085,7 @@ out: } void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address, - bool freeze) + bool freeze, struct page *page) { pgd_t *pgd; pud_t *pud; @@ -3102,6 +3102,15 @@ void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address, pmd = pmd_offset(pud, address); if (!pmd_present(*pmd) || (!pmd_trans_huge(*pmd) && !pmd_devmap(*pmd))) return; + + /* + * If caller asks to setup a migration entries, we need a page to check + * pmd against. Otherwise we can end up replacing wrong page. + */ + VM_BUG_ON(freeze && !page); + if (page && page != pmd_page(*pmd)) + return; + /* * Caller holds the mmap_sem write mode, so a huge pmd cannot * materialize from under us. @@ -3122,7 +3131,7 @@ void vma_adjust_trans_huge(struct vm_area_struct *vma, if (start & ~HPAGE_PMD_MASK && (start & HPAGE_PMD_MASK) >= vma->vm_start && (start & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE <= vma->vm_end) - split_huge_pmd_address(vma, start, false); + split_huge_pmd_address(vma, start, false, NULL); /* * If the new end address isn't hpage aligned and it could @@ -3132,7 +3141,7 @@ void vma_adjust_trans_huge(struct vm_area_struct *vma, if (end & ~HPAGE_PMD_MASK && (end & HPAGE_PMD_MASK) >= vma->vm_start && (end & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE <= vma->vm_end) - split_huge_pmd_address(vma, end, false); + split_huge_pmd_address(vma, end, false, NULL); /* * If we're also updating the vma->vm_next->vm_start, if the new @@ -3146,7 +3155,7 @@ void vma_adjust_trans_huge(struct vm_area_struct *vma, if (nstart & ~HPAGE_PMD_MASK && (nstart & HPAGE_PMD_MASK) >= next->vm_start && (nstart & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE <= next->vm_end) - split_huge_pmd_address(next, nstart, false); + split_huge_pmd_address(next, nstart, false, NULL); } } diff --git a/mm/rmap.c b/mm/rmap.c index 8fa96fa3225e..c399a0d41b31 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1432,7 +1432,8 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma, goto out; if (flags & TTU_SPLIT_HUGE_PMD) { - split_huge_pmd_address(vma, address, flags & TTU_MIGRATION); + split_huge_pmd_address(vma, address, + flags & TTU_MIGRATION, page); /* check if we have anything to do after split */ if (page_mapcount(page) == 0) goto out; -- Kirill A. Shutemov -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>