The patch titled Subject: mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options has been added to the -mm tree. Its filename is mm-security-introduce-init_on_alloc=1-and-init_on_free=1-boot-options.patch This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/mm-security-introduce-init_on_alloc%3D1-and-init_on_free%3D1-boot-options.patch and later at http://ozlabs.org/~akpm/mmotm/broken-out/mm-security-introduce-init_on_alloc%3D1-and-init_on_free%3D1-boot-options.patch Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Alexander Potapenko <glider@xxxxxxxxxx> Subject: mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options The new options are needed to prevent possible information leaks and make control-flow bugs that depend on uninitialized values more deterministic. init_on_alloc=1 makes the kernel initialize newly allocated pages and heap objects with zeroes. Initialization is done at allocation time at the places where checks for __GFP_ZERO are performed. init_on_free=1 makes the kernel initialize freed pages and heap objects with zeroes upon their deletion. This helps to ensure sensitive data doesn't leak via use-after-free accesses. Both init_on_alloc=1 and init_on_free=1 guarantee that the allocator returns zeroed memory. The two exceptions are slab caches with constructors and SLAB_TYPESAFE_BY_RCU flag. Those are never zero-initialized to preserve their semantics. Both init_on_alloc and init_on_free default to zero, but those defaults can be overridden with CONFIG_INIT_ON_ALLOC_DEFAULT_ON and CONFIG_INIT_ON_FREE_DEFAULT_ON. Slowdown for the new features compared to init_on_free=0, init_on_alloc=0: hackbench, init_on_free=1: +7.62% sys time (st.err 0.74%) hackbench, init_on_alloc=1: +7.75% sys time (st.err 2.14%) Linux build with -j12, init_on_free=1: +8.38% wall time (st.err 0.39%) Linux build with -j12, init_on_free=1: +24.42% sys time (st.err 0.52%) Linux build with -j12, init_on_alloc=1: -0.13% wall time (st.err 0.42%) Linux build with -j12, init_on_alloc=1: +0.57% sys time (st.err 0.40%) The slowdown for init_on_free=0, init_on_alloc=0 compared to the baseline is within the standard error. The new features are also going to pave the way for hardware memory tagging (e.g. arm64's MTE), which will require both on_alloc and on_free hooks to set the tags for heap objects. With MTE, tagging will have the same cost as memory initialization. Although init_on_free is rather costly, there are paranoid use-cases where in-memory data lifetime is desired to be minimized. There are various arguments for/against the realism of the associated threat models, but given that we'll need the infrastructre for MTE anyway, and there are people who want wipe-on-free behavior no matter what the performance cost, it seems reasonable to include it in this series. Link: http://lkml.kernel.org/r/20190617151050.92663-2-glider@xxxxxxxxxx Signed-off-by: Alexander Potapenko <glider@xxxxxxxxxx> Acked-by: Kees Cook <keescook@xxxxxxxxxxxx> Cc: Christoph Lameter <cl@xxxxxxxxx> Cc: Masahiro Yamada <yamada.masahiro@xxxxxxxxxxxxx> Cc: Michal Hocko <mhocko@xxxxxxxxxx> Cc: James Morris <jmorris@xxxxxxxxx> Cc: "Serge E. Hallyn" <serge@xxxxxxxxxx> Cc: Nick Desaulniers <ndesaulniers@xxxxxxxxxx> Cc: Kostya Serebryany <kcc@xxxxxxxxxx> Cc: Dmitry Vyukov <dvyukov@xxxxxxxxxx> Cc: Sandeep Patil <sspatil@xxxxxxxxxxx> Cc: Laura Abbott <labbott@xxxxxxxxxx> Cc: Randy Dunlap <rdunlap@xxxxxxxxxxxxx> Cc: Jann Horn <jannh@xxxxxxxxxx> Cc: Mark Rutland <mark.rutland@xxxxxxx> Cc: Marco Elver <elver@xxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- Documentation/admin-guide/kernel-parameters.txt | 9 ++ drivers/infiniband/core/uverbs_ioctl.c | 2 include/linux/mm.h | 22 ++++ kernel/kexec_core.c | 2 mm/dmapool.c | 2 mm/page_alloc.c | 63 ++++++++++++-- mm/slab.c | 16 ++- mm/slab.h | 19 ++++ mm/slub.c | 33 ++++++- net/core/sock.c | 2 security/Kconfig.hardening | 29 ++++++ 11 files changed, 180 insertions(+), 19 deletions(-) --- a/Documentation/admin-guide/kernel-parameters.txt~mm-security-introduce-init_on_alloc=1-and-init_on_free=1-boot-options +++ a/Documentation/admin-guide/kernel-parameters.txt @@ -1671,6 +1671,15 @@ initrd= [BOOT] Specify the location of the initial ramdisk + init_on_alloc= [MM] Fill newly allocated pages and heap objects with + zeroes. + Format: 0 | 1 + Default set by CONFIG_INIT_ON_ALLOC_DEFAULT_ON. + + init_on_free= [MM] Fill freed pages and heap objects with zeroes. + Format: 0 | 1 + Default set by CONFIG_INIT_ON_FREE_DEFAULT_ON. + init_pkru= [x86] Specify the default memory protection keys rights register contents for all processes. 0x55555554 by default (disallow access to all but pkey 0). Can --- a/drivers/infiniband/core/uverbs_ioctl.c~mm-security-introduce-init_on_alloc=1-and-init_on_free=1-boot-options +++ a/drivers/infiniband/core/uverbs_ioctl.c @@ -127,7 +127,7 @@ __malloc void *_uverbs_alloc(struct uver res = (void *)pbundle->internal_buffer + pbundle->internal_used; pbundle->internal_used = ALIGN(new_used, sizeof(*pbundle->internal_buffer)); - if (flags & __GFP_ZERO) + if (want_init_on_alloc(flags)) memset(res, 0, size); return res; } --- a/include/linux/mm.h~mm-security-introduce-init_on_alloc=1-and-init_on_free=1-boot-options +++ a/include/linux/mm.h @@ -2682,6 +2682,28 @@ static inline void kernel_poison_pages(s int enable) { } #endif +#ifdef CONFIG_INIT_ON_ALLOC_DEFAULT_ON +DECLARE_STATIC_KEY_TRUE(init_on_alloc); +#else +DECLARE_STATIC_KEY_FALSE(init_on_alloc); +#endif +static inline bool want_init_on_alloc(gfp_t flags) +{ + if (static_branch_unlikely(&init_on_alloc)) + return true; + return flags & __GFP_ZERO; +} + +#ifdef CONFIG_INIT_ON_FREE_DEFAULT_ON +DECLARE_STATIC_KEY_TRUE(init_on_free); +#else +DECLARE_STATIC_KEY_FALSE(init_on_free); +#endif +static inline bool want_init_on_free(void) +{ + return static_branch_unlikely(&init_on_free); +} + #ifdef CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT DECLARE_STATIC_KEY_TRUE(_debug_pagealloc_enabled); #else --- a/kernel/kexec_core.c~mm-security-introduce-init_on_alloc=1-and-init_on_free=1-boot-options +++ a/kernel/kexec_core.c @@ -315,7 +315,7 @@ static struct page *kimage_alloc_pages(g arch_kexec_post_alloc_pages(page_address(pages), count, gfp_mask); - if (gfp_mask & __GFP_ZERO) + if (want_init_on_alloc(gfp_mask)) for (i = 0; i < count; i++) clear_highpage(pages + i); } --- a/mm/dmapool.c~mm-security-introduce-init_on_alloc=1-and-init_on_free=1-boot-options +++ a/mm/dmapool.c @@ -378,7 +378,7 @@ void *dma_pool_alloc(struct dma_pool *po #endif spin_unlock_irqrestore(&pool->lock, flags); - if (mem_flags & __GFP_ZERO) + if (want_init_on_alloc(mem_flags)) memset(retval, 0, pool->size); return retval; --- a/mm/page_alloc.c~mm-security-introduce-init_on_alloc=1-and-init_on_free=1-boot-options +++ a/mm/page_alloc.c @@ -135,6 +135,48 @@ unsigned long totalcma_pages __read_most int percpu_pagelist_fraction; gfp_t gfp_allowed_mask __read_mostly = GFP_BOOT_MASK; +#ifdef CONFIG_INIT_ON_ALLOC_DEFAULT_ON +DEFINE_STATIC_KEY_TRUE(init_on_alloc); +#else +DEFINE_STATIC_KEY_FALSE(init_on_alloc); +#endif +#ifdef CONFIG_INIT_ON_FREE_DEFAULT_ON +DEFINE_STATIC_KEY_TRUE(init_on_free); +#else +DEFINE_STATIC_KEY_FALSE(init_on_free); +#endif + +static int __init early_init_on_alloc(char *buf) +{ + int ret; + bool bool_result; + + if (!buf) + return -EINVAL; + ret = kstrtobool(buf, &bool_result); + if (bool_result) + static_branch_enable(&init_on_alloc); + else + static_branch_disable(&init_on_alloc); + return ret; +} +early_param("init_on_alloc", early_init_on_alloc); + +static int __init early_init_on_free(char *buf) +{ + int ret; + bool bool_result; + + if (!buf) + return -EINVAL; + ret = kstrtobool(buf, &bool_result); + if (bool_result) + static_branch_enable(&init_on_free); + else + static_branch_disable(&init_on_free); + return ret; +} +early_param("init_on_free", early_init_on_free); /* * A cached value of the page's pageblock's migratetype, used when the page is @@ -1069,6 +1111,14 @@ out: return ret; } +static void kernel_init_free_pages(struct page *page, int numpages) +{ + int i; + + for (i = 0; i < numpages; i++) + clear_highpage(page + i); +} + static __always_inline bool free_pages_prepare(struct page *page, unsigned int order, bool check_free) { @@ -1121,6 +1171,8 @@ static __always_inline bool free_pages_p } arch_free_page(page, order); kernel_poison_pages(page, 1 << order, 0); + if (want_init_on_free()) + kernel_init_free_pages(page, 1 << order); if (debug_pagealloc_enabled()) kernel_map_pages(page, 1 << order, 0); @@ -2020,8 +2072,8 @@ static inline int check_new_page(struct static inline bool free_pages_prezeroed(void) { - return IS_ENABLED(CONFIG_PAGE_POISONING_ZERO) && - page_poisoning_enabled(); + return (IS_ENABLED(CONFIG_PAGE_POISONING_ZERO) && + page_poisoning_enabled()) || want_init_on_free(); } #ifdef CONFIG_DEBUG_VM @@ -2091,13 +2143,10 @@ inline void post_alloc_hook(struct page static void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags, unsigned int alloc_flags) { - int i; - post_alloc_hook(page, order, gfp_flags); - if (!free_pages_prezeroed() && (gfp_flags & __GFP_ZERO)) - for (i = 0; i < (1 << order); i++) - clear_highpage(page + i); + if (!free_pages_prezeroed() && want_init_on_alloc(gfp_flags)) + kernel_init_free_pages(page, 1 << order); if (order && (gfp_flags & __GFP_COMP)) prep_compound_page(page, order); --- a/mm/slab.c~mm-security-introduce-init_on_alloc=1-and-init_on_free=1-boot-options +++ a/mm/slab.c @@ -1811,6 +1811,14 @@ static bool set_objfreelist_slab_cache(s cachep->num = 0; + /* + * If slab auto-initialization on free is enabled, store the freelist + * off-slab, so that its contents don't end up in one of the allocated + * objects. + */ + if (unlikely(slab_want_init_on_free(cachep))) + return false; + if (cachep->ctor || flags & SLAB_TYPESAFE_BY_RCU) return false; @@ -3248,7 +3256,7 @@ slab_alloc_node(struct kmem_cache *cache local_irq_restore(save_flags); ptr = cache_alloc_debugcheck_after(cachep, flags, ptr, caller); - if (unlikely(flags & __GFP_ZERO) && ptr) + if (unlikely(slab_want_init_on_alloc(flags, cachep)) && ptr) memset(ptr, 0, cachep->object_size); slab_post_alloc_hook(cachep, flags, 1, &ptr); @@ -3305,7 +3313,7 @@ slab_alloc(struct kmem_cache *cachep, gf objp = cache_alloc_debugcheck_after(cachep, flags, objp, caller); prefetchw(objp); - if (unlikely(flags & __GFP_ZERO) && objp) + if (unlikely(slab_want_init_on_alloc(flags, cachep)) && objp) memset(objp, 0, cachep->object_size); slab_post_alloc_hook(cachep, flags, 1, &objp); @@ -3426,6 +3434,8 @@ void ___cache_free(struct kmem_cache *ca struct array_cache *ac = cpu_cache_get(cachep); check_irq_off(); + if (unlikely(slab_want_init_on_free(cachep))) + memset(objp, 0, cachep->object_size); kmemleak_free_recursive(objp, cachep->flags); objp = cache_free_debugcheck(cachep, objp, caller); @@ -3513,7 +3523,7 @@ int kmem_cache_alloc_bulk(struct kmem_ca cache_alloc_debugcheck_after_bulk(s, flags, size, p, _RET_IP_); /* Clear memory outside IRQ disabled section */ - if (unlikely(flags & __GFP_ZERO)) + if (unlikely(slab_want_init_on_alloc(flags, s))) for (i = 0; i < size; i++) memset(p[i], 0, s->object_size); --- a/mm/slab.h~mm-security-introduce-init_on_alloc=1-and-init_on_free=1-boot-options +++ a/mm/slab.h @@ -598,4 +598,23 @@ static inline int cache_random_seq_creat static inline void cache_random_seq_destroy(struct kmem_cache *cachep) { } #endif /* CONFIG_SLAB_FREELIST_RANDOM */ +static inline bool slab_want_init_on_alloc(gfp_t flags, struct kmem_cache *c) +{ + if (static_branch_unlikely(&init_on_alloc)) { + if (c->ctor) + return false; + if (c->flags & SLAB_TYPESAFE_BY_RCU) + return flags & __GFP_ZERO; + return true; + } + return flags & __GFP_ZERO; +} + +static inline bool slab_want_init_on_free(struct kmem_cache *c) +{ + if (static_branch_unlikely(&init_on_free)) + return !(c->ctor || (c->flags & SLAB_TYPESAFE_BY_RCU)); + return false; +} + #endif /* MM_SLAB_H */ --- a/mm/slub.c~mm-security-introduce-init_on_alloc=1-and-init_on_free=1-boot-options +++ a/mm/slub.c @@ -1279,6 +1279,12 @@ check_slabs: if (*str == ',') slub_debug_slabs = str + 1; out: + if ((static_branch_unlikely(&init_on_alloc) || + static_branch_unlikely(&init_on_free)) && + (slub_debug & SLAB_POISON)) { + pr_warn("disabling SLAB_POISON: can't be used together with memory auto-initialization\n"); + slub_debug &= ~SLAB_POISON; + } return 1; } @@ -1422,6 +1428,19 @@ static __always_inline bool slab_free_ho static inline bool slab_free_freelist_hook(struct kmem_cache *s, void **head, void **tail) { + + void *object; + void *next = *head; + void *old_tail = *tail ? *tail : *head; + + if (slab_want_init_on_free(s)) + do { + object = next; + next = get_freepointer(s, object); + memset(object, 0, s->size); + set_freepointer(s, object, next); + } while (object != old_tail); + /* * Compiler cannot detect this function can be removed if slab_free_hook() * evaluates to nothing. Thus, catch all relevant config debug options here. @@ -1431,9 +1450,7 @@ static inline bool slab_free_freelist_ho defined(CONFIG_DEBUG_OBJECTS_FREE) || \ defined(CONFIG_KASAN) - void *object; - void *next = *head; - void *old_tail = *tail ? *tail : *head; + next = *head; /* Head and tail of the reconstructed freelist */ *head = NULL; @@ -2729,8 +2746,14 @@ redo: prefetch_freepointer(s, next_object); stat(s, ALLOC_FASTPATH); } + /* + * If the object has been wiped upon free, make sure it's fully + * initialized by zeroing out freelist pointer. + */ + if (unlikely(slab_want_init_on_free(s)) && object) + *(void **)object = NULL; - if (unlikely(gfpflags & __GFP_ZERO) && object) + if (unlikely(slab_want_init_on_alloc(gfpflags, s)) && object) memset(object, 0, s->object_size); slab_post_alloc_hook(s, gfpflags, 1, &object); @@ -3151,7 +3174,7 @@ int kmem_cache_alloc_bulk(struct kmem_ca local_irq_enable(); /* Clear memory outside IRQ disabled fastpath loop */ - if (unlikely(flags & __GFP_ZERO)) { + if (unlikely(slab_want_init_on_alloc(flags, s))) { int j; for (j = 0; j < i; j++) --- a/net/core/sock.c~mm-security-introduce-init_on_alloc=1-and-init_on_free=1-boot-options +++ a/net/core/sock.c @@ -1596,7 +1596,7 @@ static struct sock *sk_prot_alloc(struct sk = kmem_cache_alloc(slab, priority & ~__GFP_ZERO); if (!sk) return sk; - if (priority & __GFP_ZERO) + if (want_init_on_alloc(priority)) sk_prot_clear_nulls(sk, prot->obj_size); } else sk = kmalloc(prot->obj_size, priority); --- a/security/Kconfig.hardening~mm-security-introduce-init_on_alloc=1-and-init_on_free=1-boot-options +++ a/security/Kconfig.hardening @@ -160,6 +160,35 @@ config STACKLEAK_RUNTIME_DISABLE runtime to control kernel stack erasing for kernels built with CONFIG_GCC_PLUGIN_STACKLEAK. +config INIT_ON_ALLOC_DEFAULT_ON + bool "Enable heap memory zeroing on allocation by default" + help + This has the effect of setting "init_on_alloc=1" on the kernel + command line. This can be disabled with "init_on_alloc=0". + When "init_on_alloc" is enabled, all page allocator and slab + allocator memory will be zeroed when allocated, eliminating + many kinds of "uninitialized heap memory" flaws, especially + heap content exposures. The performance impact varies by + workload, but most cases see <1% impact. Some synthetic + workloads have measured as high as 7%. + +config INIT_ON_FREE_DEFAULT_ON + bool "Enable heap memory zeroing on free by default" + help + This has the effect of setting "init_on_free=1" on the kernel + command line. This can be disabled with "init_on_free=0". + Similar to "init_on_alloc", when "init_on_free" is enabled, + all page allocator and slab allocator memory will be zeroed + when freed, eliminating many kinds of "uninitialized heap memory" + flaws, especially heap content exposures. The primary difference + with "init_on_free" is that data lifetime in memory is reduced, + as anything freed is wiped immediately, making live forensics or + cold boot memory attacks unable to recover freed memory contents. + The performance impact varies by workload, but is more expensive + than "init_on_alloc" due to the negative cache effects of + touching "cold" memory areas. Most cases see 3-5% impact. Some + synthetic workloads have measured as high as 8%. + endmenu endmenu _ Patches currently in -mm which might be from glider@xxxxxxxxxx are mm-security-introduce-init_on_alloc=1-and-init_on_free=1-boot-options.patch mm-init-report-memory-auto-initialization-features-at-boot-time.patch lib-introduce-test_meminit-module.patch