The patch titled Subject: mm, THP, swap: add sysfs interface to configure THP swapin has been added to the -mm tree. Its filename is mm-thp-swap-add-sysfs-interface-to-configure-thp-swapin.patch This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/mm-thp-swap-add-sysfs-interface-to-configure-thp-swapin.patch and later at http://ozlabs.org/~akpm/mmotm/broken-out/mm-thp-swap-add-sysfs-interface-to-configure-thp-swapin.patch Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Huang Ying <ying.huang@xxxxxxxxx> Subject: mm, THP, swap: add sysfs interface to configure THP swapin Swapin a THP as a whole isn't desirable at some situations. For example, for random access pattern, swapin a THP as a whole will inflate the reading greatly. So a sysfs interface: /sys/kernel/mm/transparent_hugepage/swapin_enabled is added to configure it. Three options as follow are provided, - always: THP swapin will be enabled always - madvise: THP swapin will be enabled only for VMA with VM_HUGEPAGE flag set. - never: THP swapin will be disabled always The default configuration is: madvise. During page fault, if a PMD swap mapping is found and THP swapin is disabled, the huge swap cluster and the PMD swap mapping will be split and fallback to normal page swapin. Link: http://lkml.kernel.org/r/20180622035151.6676-12-ying.huang@xxxxxxxxx Signed-off-by: "Huang, Ying" <ying.huang@xxxxxxxxx> Cc: "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx> Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx> Cc: Michal Hocko <mhocko@xxxxxxxx> Cc: Johannes Weiner <hannes@xxxxxxxxxxx> Cc: Shaohua Li <shli@xxxxxxxxxx> Cc: Hugh Dickins <hughd@xxxxxxxxxx> Cc: Minchan Kim <minchan@xxxxxxxxxx> Cc: Rik van Riel <riel@xxxxxxxxxx> Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> Cc: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx> Cc: Zi Yan <zi.yan@xxxxxxxxxxxxxx> Cc: Daniel Jordan <daniel.m.jordan@xxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- diff -puN Documentation/admin-guide/mm/transhuge.rst~mm-thp-swap-add-sysfs-interface-to-configure-thp-swapin Documentation/admin-guide/mm/transhuge.rst --- a/Documentation/admin-guide/mm/transhuge.rst~mm-thp-swap-add-sysfs-interface-to-configure-thp-swapin +++ a/Documentation/admin-guide/mm/transhuge.rst @@ -160,6 +160,27 @@ library) may want to know the size (in b cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size +Transparent hugepage may be swapout and swapin in one piece without +splitting. This will improve the utility of transparent hugepage but +may inflate the read/write too. So whether to enable swapin +transparent hugepage in one piece can be configured as follow. + + echo always >/sys/kernel/mm/transparent_hugepage/swapin_enabled + echo madvise >/sys/kernel/mm/transparent_hugepage/swapin_enabled + echo never >/sys/kernel/mm/transparent_hugepage/swapin_enabled + +always + Attempt to allocate a transparent huge page and read it from + swap space in one piece every time. + +never + Always split the swap space and PMD swap mapping and swapin + the fault normal page during swapin. + +madvise + Only swapin the transparent huge page in one piece for + MADV_HUGEPAGE madvise regions. + khugepaged will be automatically started when transparent_hugepage/enabled is set to "always" or "madvise, and it'll be automatically shutdown if it's set to "never". diff -puN include/linux/huge_mm.h~mm-thp-swap-add-sysfs-interface-to-configure-thp-swapin include/linux/huge_mm.h --- a/include/linux/huge_mm.h~mm-thp-swap-add-sysfs-interface-to-configure-thp-swapin +++ a/include/linux/huge_mm.h @@ -62,6 +62,8 @@ enum transparent_hugepage_flag { #ifdef CONFIG_DEBUG_VM TRANSPARENT_HUGEPAGE_DEBUG_COW_FLAG, #endif + TRANSPARENT_HUGEPAGE_SWAPIN_FLAG, + TRANSPARENT_HUGEPAGE_SWAPIN_REQ_MADV_FLAG, }; struct kobject; @@ -404,11 +406,40 @@ static inline gfp_t alloc_hugepage_direc #ifdef CONFIG_THP_SWAP extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd); + +static inline bool transparent_hugepage_swapin_enabled( + struct vm_area_struct *vma) +{ + if (vma->vm_flags & VM_NOHUGEPAGE) + return false; + + if (is_vma_temporary_stack(vma)) + return false; + + if (test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) + return false; + + if (transparent_hugepage_flags & + (1 << TRANSPARENT_HUGEPAGE_SWAPIN_FLAG)) + return true; + + if (transparent_hugepage_flags & + (1 << TRANSPARENT_HUGEPAGE_SWAPIN_REQ_MADV_FLAG)) + return !!(vma->vm_flags & VM_HUGEPAGE); + + return false; +} #else /* CONFIG_THP_SWAP */ static inline int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) { return 0; } + +static inline bool transparent_hugepage_swapin_enabled( + struct vm_area_struct *vma) +{ + return false; +} #endif /* CONFIG_THP_SWAP */ #endif /* _LINUX_HUGE_MM_H */ diff -puN mm/huge_memory.c~mm-thp-swap-add-sysfs-interface-to-configure-thp-swapin mm/huge_memory.c --- a/mm/huge_memory.c~mm-thp-swap-add-sysfs-interface-to-configure-thp-swapin +++ a/mm/huge_memory.c @@ -57,7 +57,8 @@ unsigned long transparent_hugepage_flags #endif (1<<TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG)| (1<<TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG)| - (1<<TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG); + (1<<TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG)| + (1<<TRANSPARENT_HUGEPAGE_SWAPIN_REQ_MADV_FLAG); static struct shrinker deferred_split_shrinker; @@ -316,6 +317,53 @@ static struct kobj_attribute debug_cow_a __ATTR(debug_cow, 0644, debug_cow_show, debug_cow_store); #endif /* CONFIG_DEBUG_VM */ +#ifdef CONFIG_THP_SWAP +static ssize_t swapin_enabled_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + if (test_bit(TRANSPARENT_HUGEPAGE_SWAPIN_FLAG, + &transparent_hugepage_flags)) + return sprintf(buf, "[always] madvise never\n"); + else if (test_bit(TRANSPARENT_HUGEPAGE_SWAPIN_REQ_MADV_FLAG, + &transparent_hugepage_flags)) + return sprintf(buf, "always [madvise] never\n"); + else + return sprintf(buf, "always madvise [never]\n"); +} + +static ssize_t swapin_enabled_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + ssize_t ret = count; + + if (!memcmp("always", buf, + min(sizeof("always")-1, count))) { + clear_bit(TRANSPARENT_HUGEPAGE_SWAPIN_REQ_MADV_FLAG, + &transparent_hugepage_flags); + set_bit(TRANSPARENT_HUGEPAGE_SWAPIN_FLAG, + &transparent_hugepage_flags); + } else if (!memcmp("madvise", buf, + min(sizeof("madvise")-1, count))) { + clear_bit(TRANSPARENT_HUGEPAGE_SWAPIN_FLAG, + &transparent_hugepage_flags); + set_bit(TRANSPARENT_HUGEPAGE_SWAPIN_REQ_MADV_FLAG, + &transparent_hugepage_flags); + } else if (!memcmp("never", buf, + min(sizeof("never")-1, count))) { + clear_bit(TRANSPARENT_HUGEPAGE_SWAPIN_FLAG, + &transparent_hugepage_flags); + clear_bit(TRANSPARENT_HUGEPAGE_SWAPIN_REQ_MADV_FLAG, + &transparent_hugepage_flags); + } else + ret = -EINVAL; + + return ret; +} +static struct kobj_attribute swapin_enabled_attr = + __ATTR(swapin_enabled, 0644, swapin_enabled_show, swapin_enabled_store); +#endif /* CONFIG_THP_SWAP */ + static struct attribute *hugepage_attr[] = { &enabled_attr.attr, &defrag_attr.attr, @@ -327,6 +375,9 @@ static struct attribute *hugepage_attr[] #ifdef CONFIG_DEBUG_VM &debug_cow_attr.attr, #endif +#ifdef CONFIG_THP_SWAP + &swapin_enabled_attr.attr, +#endif NULL, }; @@ -1646,6 +1697,9 @@ int do_huge_pmd_swap_page(struct vm_faul retry: page = lookup_swap_cache(entry, NULL, vmf->address); if (!page) { + if (!transparent_hugepage_swapin_enabled(vma)) + goto split; + page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE, vma, haddr, false); if (!page) { @@ -1653,23 +1707,8 @@ retry: * Back out if somebody else faulted in this pmd * while we released the pmd lock. */ - if (likely(pmd_same(*vmf->pmd, orig_pmd))) { - ret = split_swap_cluster(entry, false); - /* - * Retry if somebody else swap in the swap - * entry - */ - if (ret == -EEXIST) { - ret = 0; - goto retry; - /* swapoff occurs under us */ - } else if (ret == -EINVAL) - ret = 0; - else { - count_vm_event(THP_SWPIN_FALLBACK); - goto fallback; - } - } + if (likely(pmd_same(*vmf->pmd, orig_pmd))) + goto split; delayacct_clear_flag(DELAYACCT_PF_SWAPIN); goto out; } @@ -1781,6 +1820,20 @@ fallback: if (page) put_page(page); return ret; +split: + ret = split_swap_cluster(entry, false); + /* Retry if somebody else swap in the swap entry */ + if (ret == -EEXIST) { + ret = 0; + goto retry; + } + /* swapoff occurs under us */ + if (ret == -EINVAL) { + delayacct_clear_flag(DELAYACCT_PF_SWAPIN); + return 0; + } + count_vm_event(THP_SWPIN_FALLBACK); + goto fallback; } #else static inline void __split_huge_swap_pmd(struct vm_area_struct *vma, _ Patches currently in -mm which might be from ying.huang@xxxxxxxxx are mm-clear_huge_page-move-order-algorithm-into-a-separate-function.patch mm-huge-page-copy-target-sub-page-last-when-copy-huge-page.patch mm-hugetlbfs-rename-address-to-haddr-in-hugetlb_cow.patch mm-hugetlbfs-pass-fault-address-to-cow-handler.patch mm-swap-fix-race-between-swapoff-and-some-swap-operations.patch mm-swap-fix-race-between-swapoff-and-some-swap-operations-v6.patch mm-fix-race-between-swapoff-and-mincore.patch mm-thp-swap-enable-pmd-swap-operations-for-config_thp_swap.patch mm-thp-swap-make-config_thp_swap-depends-on-config_swap.patch mm-thp-swap-support-pmd-swap-mapping-in-swap_duplicate.patch mm-thp-swap-support-pmd-swap-mapping-in-swapcache_free_cluster.patch mm-thp-swap-support-pmd-swap-mapping-in-free_swap_and_cache-swap_free.patch mm-thp-swap-support-pmd-swap-mapping-when-splitting-huge-pmd.patch mm-thp-swap-support-pmd-swap-mapping-in-split_swap_cluster.patch mm-thp-swap-support-to-read-a-huge-swap-cluster-for-swapin-a-thp.patch mm-thp-swap-swapin-a-thp-as-a-whole.patch mm-thp-swap-support-to-count-thp-swapin-and-its-fallback.patch mm-thp-swap-add-sysfs-interface-to-configure-thp-swapin.patch mm-thp-swap-support-pmd-swap-mapping-in-swapoff.patch mm-thp-swap-support-pmd-swap-mapping-in-madvise_free.patch mm-cgroup-thp-swap-support-to-move-swap-account-for-pmd-swap-mapping.patch mm-thp-swap-support-to-copy-pmd-swap-mapping-when-fork.patch mm-thp-swap-free-pmd-swap-mapping-when-zap_huge_pmd.patch mm-thp-swap-support-pmd-swap-mapping-for-madv_willneed.patch mm-thp-swap-support-pmd-swap-mapping-in-mincore.patch mm-thp-swap-support-pmd-swap-mapping-in-common-path.patch mm-thp-swap-create-pmd-swap-mapping-when-unmap-the-thp.patch mm-thp-avoid-to-split-thp-when-reclaim-madv_free-thp.patch -- To unsubscribe from this list: send the line "unsubscribe mm-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html