On Tue, Apr 19, 2016 at 12:15 AM, Joonsoo Kim <iamjoonsoo.kim@xxxxxxx> wrote: > On Mon, Apr 18, 2016 at 10:14:39AM -0700, Thomas Garnier wrote: >> Provides an optional config (CONFIG_FREELIST_RANDOM) to randomize the >> SLAB freelist. The list is randomized during initialization of a new set >> of pages. The order on different freelist sizes is pre-computed at boot >> for performance. This security feature reduces the predictability of the >> kernel SLAB allocator against heap overflows rendering attacks much less >> stable. > > I'm not familiar on security but it doesn't look much secure than > before. Is there any other way to generate different sequence of freelist > for each new set of pages? Current approach using pre-computed array will > generate same sequence of freelist for all new set of pages having same size > class. Is it sufficient? > I think it is sufficient. There is a tradeoff for performance. We could randomly pick an object from the freelist every time (on slab_get_obj) but I think it will have significant impact (at least 3%). >> For example this attack against SLUB (also applicable against SLAB) >> would be affected: >> https://jon.oberheide.org/blog/2010/09/10/linux-kernel-can-slub-overflow/ >> >> Also, since v4.6 the freelist was moved at the end of the SLAB. It means >> a controllable heap is opened to new attacks not yet publicly discussed. >> A kernel heap overflow can be transformed to multiple use-after-free. >> This feature makes this type of attack harder too. >> >> To generate entropy, we use get_random_bytes_arch because 0 bits of >> entropy is available at that boot stage. In the worse case this function >> will fallback to the get_random_bytes sub API. >> >> The config option name is not specific to the SLAB as this approach will >> be extended to other allocators like SLUB. > > If this feature will be applied to the SLUB, it's better to put common > code to mm/slab_common.c. > I think it might be moved there once we implement the SLUB counterpart but it is too early to define which part will be common. >> >> Performance results highlighted no major changes: >> >> Netperf average on 10 runs: >> >> threads,base,change >> 16,576943.10,585905.90 (101.55%) >> 32,564082.00,569741.20 (101.00%) >> 48,558334.30,561851.20 (100.63%) >> 64,552025.20,556448.30 (100.80%) >> 80,552294.40,551743.10 (99.90%) >> 96,552435.30,547529.20 (99.11%) >> 112,551320.60,550183.20 (99.79%) >> 128,549138.30,550542.70 (100.26%) >> 144,549344.50,544529.10 (99.12%) >> 160,550360.80,539929.30 (98.10%) >> >> slab_test 1 run on boot. After is faster except for odd result on size >> 2048. > > Hmm... It's odd result. It adds more logic and it should > decrease performance. I guess it would be experimental error but > do you have any analysis about this result? > I don't. I am glad to redo the test. I found that slab_test has very different result based on the heap state at the time of the test. If I run the test multiple times, I have really various results on with or without the mitigation (on dedicated hardware). >> >> Before: >> >> Single thread testing >> ===================== >> 1. Kmalloc: Repeatedly allocate then free test >> 10000 times kmalloc(8) -> 137 cycles kfree -> 126 cycles >> 10000 times kmalloc(16) -> 118 cycles kfree -> 119 cycles >> 10000 times kmalloc(32) -> 112 cycles kfree -> 119 cycles >> 10000 times kmalloc(64) -> 126 cycles kfree -> 123 cycles >> 10000 times kmalloc(128) -> 135 cycles kfree -> 131 cycles >> 10000 times kmalloc(256) -> 165 cycles kfree -> 104 cycles >> 10000 times kmalloc(512) -> 174 cycles kfree -> 126 cycles >> 10000 times kmalloc(1024) -> 242 cycles kfree -> 160 cycles >> 10000 times kmalloc(2048) -> 478 cycles kfree -> 239 cycles >> 10000 times kmalloc(4096) -> 747 cycles kfree -> 364 cycles >> 10000 times kmalloc(8192) -> 774 cycles kfree -> 404 cycles >> 10000 times kmalloc(16384) -> 849 cycles kfree -> 430 cycles >> 2. Kmalloc: alloc/free test >> 10000 times kmalloc(8)/kfree -> 118 cycles >> 10000 times kmalloc(16)/kfree -> 118 cycles >> 10000 times kmalloc(32)/kfree -> 118 cycles >> 10000 times kmalloc(64)/kfree -> 121 cycles >> 10000 times kmalloc(128)/kfree -> 118 cycles >> 10000 times kmalloc(256)/kfree -> 115 cycles >> 10000 times kmalloc(512)/kfree -> 115 cycles >> 10000 times kmalloc(1024)/kfree -> 115 cycles >> 10000 times kmalloc(2048)/kfree -> 115 cycles >> 10000 times kmalloc(4096)/kfree -> 115 cycles >> 10000 times kmalloc(8192)/kfree -> 115 cycles >> 10000 times kmalloc(16384)/kfree -> 115 cycles >> >> After: >> >> Single thread testing >> ===================== >> 1. Kmalloc: Repeatedly allocate then free test >> 10000 times kmalloc(8) -> 99 cycles kfree -> 84 cycles >> 10000 times kmalloc(16) -> 88 cycles kfree -> 83 cycles >> 10000 times kmalloc(32) -> 90 cycles kfree -> 81 cycles >> 10000 times kmalloc(64) -> 107 cycles kfree -> 97 cycles >> 10000 times kmalloc(128) -> 134 cycles kfree -> 89 cycles >> 10000 times kmalloc(256) -> 145 cycles kfree -> 97 cycles >> 10000 times kmalloc(512) -> 177 cycles kfree -> 116 cycles >> 10000 times kmalloc(1024) -> 223 cycles kfree -> 151 cycles >> 10000 times kmalloc(2048) -> 1429 cycles kfree -> 221 cycles >> 10000 times kmalloc(4096) -> 720 cycles kfree -> 348 cycles >> 10000 times kmalloc(8192) -> 788 cycles kfree -> 393 cycles >> 10000 times kmalloc(16384) -> 867 cycles kfree -> 433 cycles >> 2. Kmalloc: alloc/free test >> 10000 times kmalloc(8)/kfree -> 115 cycles >> 10000 times kmalloc(16)/kfree -> 115 cycles >> 10000 times kmalloc(32)/kfree -> 115 cycles >> 10000 times kmalloc(64)/kfree -> 120 cycles >> 10000 times kmalloc(128)/kfree -> 127 cycles >> 10000 times kmalloc(256)/kfree -> 119 cycles >> 10000 times kmalloc(512)/kfree -> 112 cycles >> 10000 times kmalloc(1024)/kfree -> 112 cycles >> 10000 times kmalloc(2048)/kfree -> 112 cycles >> 10000 times kmalloc(4096)/kfree -> 112 cycles >> 10000 times kmalloc(8192)/kfree -> 112 cycles >> 10000 times kmalloc(16384)/kfree -> 112 cycles >> >> Signed-off-by: Thomas Garnier <thgarnie@xxxxxxxxxx> >> --- >> Based on next-20160418 >> --- >> init/Kconfig | 9 ++++ >> mm/slab.c | 166 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++- >> 2 files changed, 174 insertions(+), 1 deletion(-) >> >> diff --git a/init/Kconfig b/init/Kconfig >> index 0dfd09d..ee35418 100644 >> --- a/init/Kconfig >> +++ b/init/Kconfig >> @@ -1742,6 +1742,15 @@ config SLOB >> >> endchoice >> >> +config FREELIST_RANDOM >> + default n >> + depends on SLAB >> + bool "SLAB freelist randomization" >> + help >> + Randomizes the freelist order used on creating new SLABs. This >> + security feature reduces the predictability of the kernel slab >> + allocator against heap overflows. >> + >> config SLUB_CPU_PARTIAL >> default y >> depends on SLUB && SMP >> diff --git a/mm/slab.c b/mm/slab.c >> index b70aabf..8371d80 100644 >> --- a/mm/slab.c >> +++ b/mm/slab.c >> @@ -116,6 +116,7 @@ >> #include <linux/kmemcheck.h> >> #include <linux/memory.h> >> #include <linux/prefetch.h> >> +#include <linux/log2.h> >> >> #include <net/sock.h> >> >> @@ -1229,6 +1230,62 @@ static void __init set_up_node(struct kmem_cache *cachep, int index) >> } >> } >> >> +#ifdef CONFIG_FREELIST_RANDOM >> +/* >> + * Master lists are pre-computed random lists >> + * Lists of different sizes are used to optimize performance on SLABS with >> + * different object counts. >> + */ > > If it is for optimization, it would be one option to have separate > random list for each kmem_cache. It would consume more memory but it > would be marginal. And, it provides more un-predictability and it can > give better performance because we don't need state->type (more, less) > and special handling related for it. > I am not sur because major caches are created early at boot time. We still have the same entropy problem and we are wasting a bit more memory. It will be faster on usage though but not sure it will be significant. >> +static freelist_idx_t master_list_2[2]; >> +static freelist_idx_t master_list_4[4]; >> +static freelist_idx_t master_list_8[8]; >> +static freelist_idx_t master_list_16[16]; >> +static freelist_idx_t master_list_32[32]; >> +static freelist_idx_t master_list_64[64]; >> +static freelist_idx_t master_list_128[128]; >> +static freelist_idx_t master_list_256[256]; >> +const static struct m_list { >> + size_t count; >> + freelist_idx_t *list; >> +} master_lists[] = { >> + { ARRAY_SIZE(master_list_2), master_list_2 }, >> + { ARRAY_SIZE(master_list_4), master_list_4 }, >> + { ARRAY_SIZE(master_list_8), master_list_8 }, >> + { ARRAY_SIZE(master_list_16), master_list_16 }, >> + { ARRAY_SIZE(master_list_32), master_list_32 }, >> + { ARRAY_SIZE(master_list_64), master_list_64 }, >> + { ARRAY_SIZE(master_list_128), master_list_128 }, >> + { ARRAY_SIZE(master_list_256), master_list_256 }, >> +}; >> + >> +/* Pre-compute the Freelist master lists at boot */ >> +static void __init freelist_random_init(void) >> +{ >> + unsigned int seed; >> + size_t z, i, rand; >> + struct rnd_state slab_rand; >> + >> + get_random_bytes_arch(&seed, sizeof(seed)); >> + prandom_seed_state(&slab_rand, seed); >> + >> + for (z = 0; z < ARRAY_SIZE(master_lists); z++) { >> + for (i = 0; i < master_lists[z].count; i++) >> + master_lists[z].list[i] = i; >> + >> + /* Fisher-Yates shuffle */ >> + for (i = master_lists[z].count - 1; i > 0; i--) { >> + rand = prandom_u32_state(&slab_rand); >> + rand %= (i + 1); >> + swap(master_lists[z].list[i], >> + master_lists[z].list[rand]); >> + } >> + } >> +} >> +#else >> +static inline void __init freelist_random_init(void) { } >> +#endif /* CONFIG_FREELIST_RANDOM */ >> + >> + >> /* >> * Initialisation. Called after the page allocator have been initialised and >> * before smp_init(). >> @@ -1255,6 +1312,8 @@ void __init kmem_cache_init(void) >> if (!slab_max_order_set && totalram_pages > (32 << 20) >> PAGE_SHIFT) >> slab_max_order = SLAB_MAX_ORDER_HI; >> >> + freelist_random_init(); >> + >> /* Bootstrap is tricky, because several objects are allocated >> * from caches that do not exist yet: >> * 1) initialize the kmem_cache cache: it contains the struct >> @@ -2442,6 +2501,107 @@ static void cache_init_objs_debug(struct kmem_cache *cachep, struct page *page) >> #endif >> } >> >> +#ifdef CONFIG_FREELIST_RANDOM >> +/* Identify if the target freelist matches the pre-computed list */ >> +enum master_type { >> + match, >> + less, >> + more >> +}; >> + >> +/* Hold information during a freelist initialization */ >> +struct freelist_init_state { >> + unsigned int padding; >> + unsigned int pos; >> + unsigned int count; >> + struct m_list master_list; >> + unsigned int master_count; >> + enum master_type type; >> +}; >> + >> +/* Select the right pre-computed master list and initialize state */ >> +static void freelist_state_initialize(struct freelist_init_state *state, >> + unsigned int count) >> +{ >> + unsigned int idx; >> + const unsigned int last_idx = ARRAY_SIZE(master_lists) - 1; >> + >> + memset(state, 0, sizeof(*state)); >> + state->count = count; >> + state->pos = 0; > > Using pos = 0 here looks not good in terms of security. In this case, > every new page having same size class have same sequence of freelist since boot. > > How about using random value to set pos? It provides some more randomness > with minimal overhead. > I think it is a good idea. I will add that for the next iteration. >> + /* count is always >= 2 */ >> + idx = ilog2(count) - 1; >> + if (idx >= last_idx) >> + idx = last_idx; >> + else if (roundup_pow_of_two(idx + 1) != count) >> + idx++; >> + state->master_list = master_lists[idx]; >> + if (state->master_list.count == state->count) >> + state->type = match; >> + else if (state->master_list.count > state->count) >> + state->type = more; >> + else >> + state->type = less; >> +} >> + >> +/* Get the next entry on the master list depending on the target list size */ >> +static freelist_idx_t get_next_entry(struct freelist_init_state *state) >> +{ >> + if (state->type == less && state->pos == state->master_list.count) { >> + state->padding += state->pos; >> + state->pos = 0; >> + } >> + BUG_ON(state->pos >= state->master_list.count); >> + return state->master_list.list[state->pos++]; >> +} >> + >> +static freelist_idx_t next_random_slot(struct freelist_init_state *state) >> +{ >> + freelist_idx_t cur, entry; >> + >> + entry = get_next_entry(state); >> + >> + if (state->type != match) { >> + while ((entry + state->padding) >= state->count) >> + entry = get_next_entry(state); >> + cur = entry + state->padding; >> + BUG_ON(cur >= state->count); >> + } else { >> + cur = entry; >> + } >> + >> + return cur; >> +} >> + >> +/* Shuffle the freelist initialization state based on pre-computed lists */ >> +static void shuffle_freelist(struct kmem_cache *cachep, struct page *page, >> + unsigned int count) >> +{ >> + unsigned int i; >> + struct freelist_init_state state; >> + >> + if (count < 2) { >> + for (i = 0; i < count; i++) >> + set_free_obj(page, i, i); >> + return; >> + } >> + >> + /* Last chunk is used already in this case */ >> + if (OBJFREELIST_SLAB(cachep)) >> + count--; >> + >> + freelist_state_initialize(&state, count); >> + for (i = 0; i < count; i++) >> + set_free_obj(page, i, next_random_slot(&state)); >> + >> + if (OBJFREELIST_SLAB(cachep)) >> + set_free_obj(page, i, i); > > Please consider last object of OBJFREELIST_SLAB cache, too. > > freelist_state_init() > last_obj = next_randome_slot() > page->freelist = XXX > for (i = 0; i < count - 1; i++) > set_free_obj() > set_free_obj(last_obj); > > Thanks. > The current implementation take the last chunk by default before the freelist is initialized. Do you want it to be randomized as well? >> +} >> +#else >> +static inline void shuffle_freelist(struct kmem_cache *cachep, >> + struct page *page, unsigned int count) { } >> +#endif /* CONFIG_FREELIST_RANDOM */ >> + >> static void cache_init_objs(struct kmem_cache *cachep, >> struct page *page) >> { >> @@ -2464,8 +2624,12 @@ static void cache_init_objs(struct kmem_cache *cachep, >> kasan_poison_object_data(cachep, objp); >> } >> >> - set_free_obj(page, i, i); >> + /* If enabled, initialization is done in shuffle_freelist */ >> + if (!config_enabled(CONFIG_FREELIST_RANDOM)) >> + set_free_obj(page, i, i); >> } >> + >> + shuffle_freelist(cachep, page, cachep->num); >> } >> >> static void kmem_flagcheck(struct kmem_cache *cachep, gfp_t flags) >> -- >> 2.8.0.rc3.226.g39d4020 >> >> -- >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> the body to majordomo@xxxxxxxxx. For more info on Linux MM, >> see: http://www.linux-mm.org/ . >> Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>