The patch titled Subject: shmem: prepare huge= mount option and sysfs knob has been added to the -mm tree. Its filename is shmem-prepare-huge=-mount-option-and-sysfs-knob.patch This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/shmem-prepare-huge%3D-mount-option-and-sysfs-knob.patch and later at http://ozlabs.org/~akpm/mmotm/broken-out/shmem-prepare-huge%3D-mount-option-and-sysfs-knob.patch Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/SubmitChecklist when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx> Subject: shmem: prepare huge= mount option and sysfs knob This patch adds new mount option "huge=". It can have following values: - "always": Attempt to allocate huge pages every time we need a new page; - "never": Do not allocate huge pages; - "within_size": Only allocate huge page if it will be fully within i_size. Also respect fadvise()/madvise() hints; - "advise: Only allocate huge pages if requested with fadvise()/madvise(); Default is "never" for now. "mount -o remount,huge= /mountpoint" works fine after mount: remounting huge=never will not attempt to break up huge pages at all, just stop more from being allocated. No new config option: put this under CONFIG_TRANSPARENT_HUGEPAGE, which is the appropriate option to protect those who don't want the new bloat, and with which we shall share some pmd code. Prohibit the option when !CONFIG_TRANSPARENT_HUGEPAGE, just as mpol is invalid without CONFIG_NUMA (was hidden in mpol_parse_str(): make it explicit). Allow enabling THP only if the machine has_transparent_hugepage(). But what about Shmem with no user-visible mount? SysV SHM, memfds, shared anonymous mmaps (of /dev/zero or MAP_ANONYMOUS), GPU drivers' DRM objects, Ashmem. Though unlikely to suit all usages, provide sysfs knob /sys/kernel/mm/transparent_hugepage/shmem_enabled to experiment with huge on those. And allow shmem_enabled two further values: - "deny": For use in emergencies, to force the huge option off from all mounts; - "force": Force the huge option on for all - very useful for testing; Based on patch by Hugh Dickins. Link: http://lkml.kernel.org/r/1465297246-98985-23-git-send-email-kirill.shutemov@xxxxxxxxxxxxxxx Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> Cc: Hugh Dickins <hughd@xxxxxxxxxx> Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx> Cc: Dave Hansen <dave.hansen@xxxxxxxxx> Cc: Vlastimil Babka <vbabka@xxxxxxx> Cc: Christoph Lameter <cl@xxxxxxxxxx> Cc: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx> Cc: Jerome Marchand <jmarchan@xxxxxxxxxx> Cc: Yang Shi <yang.shi@xxxxxxxxxx> Cc: Sasha Levin <sasha.levin@xxxxxxxxxx> Cc: Andres Lagar-Cavilla <andreslc@xxxxxxxxxx> Cc: Ning Qu <quning@xxxxxxxxx> Cc: Aneesh Kumar K.V <aneesh.kumar@xxxxxxxxxxxxxxxxxx> Cc: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- include/linux/huge_mm.h | 2 include/linux/shmem_fs.h | 3 mm/huge_memory.c | 3 mm/shmem.c | 161 +++++++++++++++++++++++++++++++++++++ 4 files changed, 168 insertions(+), 1 deletion(-) diff -puN include/linux/huge_mm.h~shmem-prepare-huge=-mount-option-and-sysfs-knob include/linux/huge_mm.h --- a/include/linux/huge_mm.h~shmem-prepare-huge=-mount-option-and-sysfs-knob +++ a/include/linux/huge_mm.h @@ -41,6 +41,8 @@ enum transparent_hugepage_flag { #endif }; +extern struct kobj_attribute shmem_enabled_attr; + #define HPAGE_PMD_ORDER (HPAGE_PMD_SHIFT-PAGE_SHIFT) #define HPAGE_PMD_NR (1<<HPAGE_PMD_ORDER) diff -puN include/linux/shmem_fs.h~shmem-prepare-huge=-mount-option-and-sysfs-knob include/linux/shmem_fs.h --- a/include/linux/shmem_fs.h~shmem-prepare-huge=-mount-option-and-sysfs-knob +++ a/include/linux/shmem_fs.h @@ -28,9 +28,10 @@ struct shmem_sb_info { unsigned long max_inodes; /* How many inodes are allowed */ unsigned long free_inodes; /* How many are left for allocation */ spinlock_t stat_lock; /* Serialize shmem_sb_info changes */ + umode_t mode; /* Mount mode for root directory */ + unsigned char huge; /* Whether to try for hugepages */ kuid_t uid; /* Mount uid for root directory */ kgid_t gid; /* Mount gid for root directory */ - umode_t mode; /* Mount mode for root directory */ struct mempolicy *mpol; /* default memory policy for mappings */ }; diff -puN mm/huge_memory.c~shmem-prepare-huge=-mount-option-and-sysfs-knob mm/huge_memory.c --- a/mm/huge_memory.c~shmem-prepare-huge=-mount-option-and-sysfs-knob +++ a/mm/huge_memory.c @@ -443,6 +443,9 @@ static struct attribute *hugepage_attr[] &enabled_attr.attr, &defrag_attr.attr, &use_zero_page_attr.attr, +#ifdef CONFIG_SHMEM + &shmem_enabled_attr.attr, +#endif #ifdef CONFIG_DEBUG_VM &debug_cow_attr.attr, #endif diff -puN mm/shmem.c~shmem-prepare-huge=-mount-option-and-sysfs-knob mm/shmem.c --- a/mm/shmem.c~shmem-prepare-huge=-mount-option-and-sysfs-knob +++ a/mm/shmem.c @@ -289,6 +289,87 @@ static bool shmem_confirm_swap(struct ad } /* + * Definitions for "huge tmpfs": tmpfs mounted with the huge= option + * + * SHMEM_HUGE_NEVER: + * disables huge pages for the mount; + * SHMEM_HUGE_ALWAYS: + * enables huge pages for the mount; + * SHMEM_HUGE_WITHIN_SIZE: + * only allocate huge pages if the page will be fully within i_size, + * also respect fadvise()/madvise() hints; + * SHMEM_HUGE_ADVISE: + * only allocate huge pages if requested with fadvise()/madvise(); + */ + +#define SHMEM_HUGE_NEVER 0 +#define SHMEM_HUGE_ALWAYS 1 +#define SHMEM_HUGE_WITHIN_SIZE 2 +#define SHMEM_HUGE_ADVISE 3 + +/* + * Special values. + * Only can be set via /sys/kernel/mm/transparent_hugepage/shmem_enabled: + * + * SHMEM_HUGE_DENY: + * disables huge on shm_mnt and all mounts, for emergency use; + * SHMEM_HUGE_FORCE: + * enables huge on shm_mnt and all mounts, w/o needing option, for testing; + * + */ +#define SHMEM_HUGE_DENY (-1) +#define SHMEM_HUGE_FORCE (-2) + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +/* ifdef here to avoid bloating shmem.o when not necessary */ + +int shmem_huge __read_mostly; + +static int shmem_parse_huge(const char *str) +{ + if (!strcmp(str, "never")) + return SHMEM_HUGE_NEVER; + if (!strcmp(str, "always")) + return SHMEM_HUGE_ALWAYS; + if (!strcmp(str, "within_size")) + return SHMEM_HUGE_WITHIN_SIZE; + if (!strcmp(str, "advise")) + return SHMEM_HUGE_ADVISE; + if (!strcmp(str, "deny")) + return SHMEM_HUGE_DENY; + if (!strcmp(str, "force")) + return SHMEM_HUGE_FORCE; + return -EINVAL; +} + +static const char *shmem_format_huge(int huge) +{ + switch (huge) { + case SHMEM_HUGE_NEVER: + return "never"; + case SHMEM_HUGE_ALWAYS: + return "always"; + case SHMEM_HUGE_WITHIN_SIZE: + return "within_size"; + case SHMEM_HUGE_ADVISE: + return "advise"; + case SHMEM_HUGE_DENY: + return "deny"; + case SHMEM_HUGE_FORCE: + return "force"; + default: + VM_BUG_ON(1); + return "bad_val"; + } +} + +#else /* !CONFIG_TRANSPARENT_HUGEPAGE */ + +#define shmem_huge SHMEM_HUGE_DENY + +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ + +/* * Like add_to_page_cache_locked, but error if expected item has gone. */ static int shmem_add_to_page_cache(struct page *page, @@ -2858,11 +2939,24 @@ static int shmem_parse_options(char *opt sbinfo->gid = make_kgid(current_user_ns(), gid); if (!gid_valid(sbinfo->gid)) goto bad_val; +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + } else if (!strcmp(this_char, "huge")) { + int huge; + huge = shmem_parse_huge(value); + if (huge < 0) + goto bad_val; + if (!has_transparent_hugepage() && + huge != SHMEM_HUGE_NEVER) + goto bad_val; + sbinfo->huge = huge; +#endif +#ifdef CONFIG_NUMA } else if (!strcmp(this_char,"mpol")) { mpol_put(mpol); mpol = NULL; if (mpol_parse_str(value, &mpol)) goto bad_val; +#endif } else { pr_err("tmpfs: Bad mount option %s\n", this_char); goto error; @@ -2908,6 +3002,7 @@ static int shmem_remount_fs(struct super goto out; error = 0; + sbinfo->huge = config.huge; sbinfo->max_blocks = config.max_blocks; sbinfo->max_inodes = config.max_inodes; sbinfo->free_inodes = config.max_inodes - inodes; @@ -2941,6 +3036,11 @@ static int shmem_show_options(struct seq if (!gid_eq(sbinfo->gid, GLOBAL_ROOT_GID)) seq_printf(seq, ",gid=%u", from_kgid_munged(&init_user_ns, sbinfo->gid)); +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + /* Rightly or wrongly, show huge mount option unmasked by shmem_huge */ + if (sbinfo->huge) + seq_printf(seq, ",huge=%s", shmem_format_huge(sbinfo->huge)); +#endif shmem_show_mpol(seq, sbinfo->mpol); return 0; } @@ -3280,6 +3380,13 @@ int __init shmem_init(void) pr_err("Could not kern_mount tmpfs\n"); goto out1; } + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + if (has_transparent_hugepage() && shmem_huge < SHMEM_HUGE_DENY) + SHMEM_SB(shm_mnt->mnt_sb)->huge = shmem_huge; + else + shmem_huge = 0; /* just in case it was patched */ +#endif return 0; out1: @@ -3291,6 +3398,60 @@ out3: return error; } +#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && defined(CONFIG_SYSFS) +static ssize_t shmem_enabled_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + int values[] = { + SHMEM_HUGE_ALWAYS, + SHMEM_HUGE_WITHIN_SIZE, + SHMEM_HUGE_ADVISE, + SHMEM_HUGE_NEVER, + SHMEM_HUGE_DENY, + SHMEM_HUGE_FORCE, + }; + int i, count; + + for (i = 0, count = 0; i < ARRAY_SIZE(values); i++) { + const char *fmt = shmem_huge == values[i] ? "[%s] " : "%s "; + + count += sprintf(buf + count, fmt, + shmem_format_huge(values[i])); + } + buf[count - 1] = '\n'; + return count; +} + +static ssize_t shmem_enabled_store(struct kobject *kobj, + struct kobj_attribute *attr, const char *buf, size_t count) +{ + char tmp[16]; + int huge; + + if (count + 1 > sizeof(tmp)) + return -EINVAL; + memcpy(tmp, buf, count); + tmp[count] = '\0'; + if (count && tmp[count - 1] == '\n') + tmp[count - 1] = '\0'; + + huge = shmem_parse_huge(tmp); + if (huge == -EINVAL) + return -EINVAL; + if (!has_transparent_hugepage() && + huge != SHMEM_HUGE_NEVER && huge != SHMEM_HUGE_DENY) + return -EINVAL; + + shmem_huge = huge; + if (shmem_huge < SHMEM_HUGE_DENY) + SHMEM_SB(shm_mnt->mnt_sb)->huge = shmem_huge; + return count; +} + +struct kobj_attribute shmem_enabled_attr = + __ATTR(shmem_enabled, 0644, shmem_enabled_show, shmem_enabled_store); +#endif /* CONFIG_TRANSPARENT_HUGEPAGE && CONFIG_SYSFS */ + #else /* !CONFIG_SHMEM */ /* _ Patches currently in -mm which might be from kirill.shutemov@xxxxxxxxxxxxxxx are mm-make-swapin-readahead-to-improve-thp-collapse-rate-fix.patch mm-make-swapin-readahead-to-improve-thp-collapse-rate-fix-2.patch mm-make-swapin-readahead-to-improve-thp-collapse-rate-fix-3.patch mm-thp-make-swapin-readahead-under-down_read-of-mmap_sem-fix.patch thp-mlock-update-unevictable-lrutxt.patch mm-do-not-pass-mm_struct-into-handle_mm_fault.patch mm-introduce-fault_env.patch mm-postpone-page-table-allocation-until-we-have-page-to-map.patch rmap-support-file-thp.patch mm-introduce-do_set_pmd.patch thp-vmstats-add-counters-for-huge-file-pages.patch thp-support-file-pages-in-zap_huge_pmd.patch thp-handle-file-pages-in-split_huge_pmd.patch thp-handle-file-cow-faults.patch thp-skip-file-huge-pmd-on-copy_huge_pmd.patch thp-prepare-change_huge_pmd-for-file-thp.patch thp-run-vma_adjust_trans_huge-outside-i_mmap_rwsem.patch thp-file-pages-support-for-split_huge_page.patch thp-mlock-do-not-mlock-pte-mapped-file-huge-pages.patch vmscan-split-file-huge-pages-before-paging-them-out.patch page-flags-relax-policy-for-pg_mappedtodisk-and-pg_reclaim.patch radix-tree-implement-radix_tree_maybe_preload_order.patch filemap-prepare-find-and-delete-operations-for-huge-pages.patch truncate-handle-file-thp.patch mm-rmap-account-shmem-thp-pages.patch shmem-prepare-huge=-mount-option-and-sysfs-knob.patch shmem-add-huge-pages-support.patch shmem-thp-respect-madv_nohugepage-for-file-mappings.patch thp-extract-khugepaged-from-mm-huge_memoryc.patch khugepaged-move-up_readmmap_sem-out-of-khugepaged_alloc_page.patch shmem-make-shmem_inode_info-lock-irq-safe.patch khugepaged-add-support-of-collapse-for-tmpfs-shmem-pages.patch thp-introduce-config_transparent_huge_pagecache.patch shmem-split-huge-pages-beyond-i_size-under-memory-pressure.patch thp-update-documentation-vm-transhugefilesystems-proctxt.patch a.patch -- To unsubscribe from this list: send the line "unsubscribe mm-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html