The patch titled Subject: mm, THP, swap: support PMD swap mapping in swapoff has been added to the -mm tree. Its filename is mm-thp-swap-support-pmd-swap-mapping-in-swapoff.patch This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/mm-thp-swap-support-pmd-swap-mapping-in-swapoff.patch and later at http://ozlabs.org/~akpm/mmotm/broken-out/mm-thp-swap-support-pmd-swap-mapping-in-swapoff.patch Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Huang Ying <ying.huang@xxxxxxxxx> Subject: mm, THP, swap: support PMD swap mapping in swapoff During swapoff, for a huge swap cluster, we need to allocate a THP, read its contents into the THP and unuse the PMD and PTE swap mappings to it. If failed to allocate a THP, the huge swap cluster will be split. During unuse, if it is found that the swap cluster mapped by a PMD swap mapping is split already, we will split the PMD swap mapping and unuse the PTEs. Link: http://lkml.kernel.org/r/20180622035151.6676-13-ying.huang@xxxxxxxxx Signed-off-by: "Huang, Ying" <ying.huang@xxxxxxxxx> Cc: "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx> Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx> Cc: Michal Hocko <mhocko@xxxxxxxx> Cc: Johannes Weiner <hannes@xxxxxxxxxxx> Cc: Shaohua Li <shli@xxxxxxxxxx> Cc: Hugh Dickins <hughd@xxxxxxxxxx> Cc: Minchan Kim <minchan@xxxxxxxxxx> Cc: Rik van Riel <riel@xxxxxxxxxx> Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> Cc: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx> Cc: Zi Yan <zi.yan@xxxxxxxxxxxxxx> Cc: Daniel Jordan <daniel.m.jordan@xxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- diff -puN include/asm-generic/pgtable.h~mm-thp-swap-support-pmd-swap-mapping-in-swapoff include/asm-generic/pgtable.h --- a/include/asm-generic/pgtable.h~mm-thp-swap-support-pmd-swap-mapping-in-swapoff +++ a/include/asm-generic/pgtable.h @@ -931,22 +931,13 @@ static inline int pmd_none_or_trans_huge barrier(); #endif /* - * !pmd_present() checks for pmd migration entries - * - * The complete check uses is_pmd_migration_entry() in linux/swapops.h - * But using that requires moving current function and pmd_trans_unstable() - * to linux/swapops.h to resovle dependency, which is too much code move. - * - * !pmd_present() is equivalent to is_pmd_migration_entry() currently, - * because !pmd_present() pages can only be under migration not swapped - * out. - * - * pmd_none() is preseved for future condition checks on pmd migration + * pmd_none() is preseved for future condition checks on pmd swap * entries and not confusing with this function name, although it is * redundant with !pmd_present(). */ if (pmd_none(pmdval) || pmd_trans_huge(pmdval) || - (IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION) && !pmd_present(pmdval))) + ((IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION) || + IS_ENABLED(CONFIG_THP_SWAP)) && !pmd_present(pmdval))) return 1; if (unlikely(pmd_bad(pmdval))) { pmd_clear_bad(pmd); diff -puN include/linux/huge_mm.h~mm-thp-swap-support-pmd-swap-mapping-in-swapoff include/linux/huge_mm.h --- a/include/linux/huge_mm.h~mm-thp-swap-support-pmd-swap-mapping-in-swapoff +++ a/include/linux/huge_mm.h @@ -405,6 +405,8 @@ static inline gfp_t alloc_hugepage_direc #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #ifdef CONFIG_THP_SWAP +extern int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, pmd_t orig_pmd); extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd); static inline bool transparent_hugepage_swapin_enabled( @@ -430,6 +432,12 @@ static inline bool transparent_hugepage_ return false; } #else /* CONFIG_THP_SWAP */ +static inline int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, pmd_t orig_pmd) +{ + return 0; +} + static inline int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) { return 0; diff -puN mm/huge_memory.c~mm-thp-swap-support-pmd-swap-mapping-in-swapoff mm/huge_memory.c --- a/mm/huge_memory.c~mm-thp-swap-support-pmd-swap-mapping-in-swapoff +++ a/mm/huge_memory.c @@ -1664,8 +1664,8 @@ static void __split_huge_swap_pmd(struct pmd_populate(mm, pmd, pgtable); } -static int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, - unsigned long address, pmd_t orig_pmd) +int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, pmd_t orig_pmd) { struct mm_struct *mm = vma->vm_mm; spinlock_t *ptl; diff -puN mm/swapfile.c~mm-thp-swap-support-pmd-swap-mapping-in-swapoff mm/swapfile.c --- a/mm/swapfile.c~mm-thp-swap-support-pmd-swap-mapping-in-swapoff +++ a/mm/swapfile.c @@ -1933,6 +1933,11 @@ static inline int pte_same_as_swp(pte_t return pte_same(pte_swp_clear_soft_dirty(pte), swp_pte); } +static inline int pmd_same_as_swp(pmd_t pmd, pmd_t swp_pmd) +{ + return pmd_same(pmd_swp_clear_soft_dirty(pmd), swp_pmd); +} + /* * No need to decide whether this PTE shares the swap entry with others, * just let do_wp_page work it out if a write is requested later - to @@ -1994,6 +1999,57 @@ out_nolock: return ret; } +#ifdef CONFIG_THP_SWAP +static int unuse_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long addr, swp_entry_t entry, struct page *page) +{ + struct mem_cgroup *memcg; + struct swap_info_struct *si; + spinlock_t *ptl; + int ret = 1; + + if (mem_cgroup_try_charge(page, vma->vm_mm, GFP_KERNEL, + &memcg, true)) { + ret = -ENOMEM; + goto out_nolock; + } + + ptl = pmd_lock(vma->vm_mm, pmd); + if (unlikely(!pmd_same_as_swp(*pmd, swp_entry_to_pmd(entry)))) { + mem_cgroup_cancel_charge(page, memcg, true); + ret = 0; + goto out; + } + + add_mm_counter(vma->vm_mm, MM_SWAPENTS, -HPAGE_PMD_NR); + add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR); + get_page(page); + set_pmd_at(vma->vm_mm, addr, pmd, + pmd_mkold(mk_huge_pmd(page, vma->vm_page_prot))); + page_add_anon_rmap(page, vma, addr, true); + mem_cgroup_commit_charge(page, memcg, true, true); + si = _swap_info_get(entry); + if (si) + swap_free_cluster(si, entry); + /* + * Move the page to the active list so it is not + * immediately swapped out again after swapon. + */ + activate_page(page); +out: + spin_unlock(ptl); +out_nolock: + return ret; +} +#else +static inline int unuse_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long addr, swp_entry_t entry, + struct page *page) +{ + return 0; +} +#endif + static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, unsigned long end, swp_entry_t entry, struct page *page) @@ -2034,7 +2090,7 @@ static inline int unuse_pmd_range(struct unsigned long addr, unsigned long end, swp_entry_t entry, struct page *page) { - pmd_t *pmd; + pmd_t swp_pmd = swp_entry_to_pmd(entry), *pmd, orig_pmd; unsigned long next; int ret; @@ -2042,6 +2098,24 @@ static inline int unuse_pmd_range(struct do { cond_resched(); next = pmd_addr_end(addr, end); + orig_pmd = *pmd; + if (thp_swap_supported() && is_swap_pmd(orig_pmd)) { + if (likely(!pmd_same_as_swp(orig_pmd, swp_pmd))) + continue; + /* Huge cluster has been split already */ + if (!PageTransCompound(page)) { + ret = split_huge_swap_pmd(vma, pmd, + addr, orig_pmd); + if (ret) + return ret; + ret = unuse_pte_range(vma, pmd, addr, + next, entry, page); + } else + ret = unuse_pmd(vma, pmd, addr, entry, page); + if (ret) + return ret; + continue; + } if (pmd_none_or_trans_huge_or_clear_bad(pmd)) continue; ret = unuse_pte_range(vma, pmd, addr, next, entry, page); @@ -2206,6 +2280,7 @@ int try_to_unuse(unsigned int type, bool * to prevent compiler doing * something odd. */ + struct swap_cluster_info *ci = NULL; unsigned char swcount; struct page *page; swp_entry_t entry; @@ -2235,6 +2310,7 @@ int try_to_unuse(unsigned int type, bool * there are races when an instance of an entry might be missed. */ while ((i = find_next_to_unuse(si, i, frontswap)) != 0) { +retry: if (signal_pending(current)) { retval = -EINTR; break; @@ -2246,6 +2322,8 @@ int try_to_unuse(unsigned int type, bool * page and read the swap into it. */ swap_map = &si->swap_map[i]; + if (si->cluster_info) + ci = si->cluster_info + i / SWAPFILE_CLUSTER; entry = swp_entry(type, i); page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE, NULL, 0, false); @@ -2266,6 +2344,12 @@ int try_to_unuse(unsigned int type, bool */ if (!swcount || swcount == SWAP_MAP_BAD) continue; + /* Split huge cluster if failed to allocate huge page */ + if (thp_swap_supported() && cluster_is_huge(ci)) { + retval = split_swap_cluster(entry, false); + if (!retval || retval == -EEXIST) + goto retry; + } retval = -ENOMEM; break; } _ Patches currently in -mm which might be from ying.huang@xxxxxxxxx are mm-clear_huge_page-move-order-algorithm-into-a-separate-function.patch mm-huge-page-copy-target-sub-page-last-when-copy-huge-page.patch mm-hugetlbfs-rename-address-to-haddr-in-hugetlb_cow.patch mm-hugetlbfs-pass-fault-address-to-cow-handler.patch mm-swap-fix-race-between-swapoff-and-some-swap-operations.patch mm-swap-fix-race-between-swapoff-and-some-swap-operations-v6.patch mm-fix-race-between-swapoff-and-mincore.patch mm-thp-swap-enable-pmd-swap-operations-for-config_thp_swap.patch mm-thp-swap-make-config_thp_swap-depends-on-config_swap.patch mm-thp-swap-support-pmd-swap-mapping-in-swap_duplicate.patch mm-thp-swap-support-pmd-swap-mapping-in-swapcache_free_cluster.patch mm-thp-swap-support-pmd-swap-mapping-in-free_swap_and_cache-swap_free.patch mm-thp-swap-support-pmd-swap-mapping-when-splitting-huge-pmd.patch mm-thp-swap-support-pmd-swap-mapping-in-split_swap_cluster.patch mm-thp-swap-support-to-read-a-huge-swap-cluster-for-swapin-a-thp.patch mm-thp-swap-swapin-a-thp-as-a-whole.patch mm-thp-swap-support-to-count-thp-swapin-and-its-fallback.patch mm-thp-swap-add-sysfs-interface-to-configure-thp-swapin.patch mm-thp-swap-support-pmd-swap-mapping-in-swapoff.patch mm-thp-swap-support-pmd-swap-mapping-in-madvise_free.patch mm-cgroup-thp-swap-support-to-move-swap-account-for-pmd-swap-mapping.patch mm-thp-swap-support-to-copy-pmd-swap-mapping-when-fork.patch mm-thp-swap-free-pmd-swap-mapping-when-zap_huge_pmd.patch mm-thp-swap-support-pmd-swap-mapping-for-madv_willneed.patch mm-thp-swap-support-pmd-swap-mapping-in-mincore.patch mm-thp-swap-support-pmd-swap-mapping-in-common-path.patch mm-thp-swap-create-pmd-swap-mapping-when-unmap-the-thp.patch mm-thp-avoid-to-split-thp-when-reclaim-madv_free-thp.patch -- To unsubscribe from this list: send the line "unsubscribe mm-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html