On Fri, Jun 16, 2023 at 07:18:43PM +0800, GONG, Ruiqi wrote: > When exploiting memory vulnerabilities, "heap spraying" is a common > technique targeting those related to dynamic memory allocation (i.e. the > "heap"), and it plays an important role in a successful exploitation. > Basically, it is to overwrite the memory area of vulnerable object by > triggering allocation in other subsystems or modules and therefore > getting a reference to the targeted memory location. It's usable on > various types of vulnerablity including use after free (UAF), heap out- > of-bound write and etc. > > There are (at least) two reasons why the heap can be sprayed: 1) generic > slab caches are shared among different subsystems and modules, and > 2) dedicated slab caches could be merged with the generic ones. > Currently these two factors cannot be prevented at a low cost: the first > one is a widely used memory allocation mechanism, and shutting down slab > merging completely via `slub_nomerge` would be overkill. > > To efficiently prevent heap spraying, we propose the following approach: > to create multiple copies of generic slab caches that will never be > merged, and random one of them will be used at allocation. The random > selection is based on the address of code that calls `kmalloc()`, which > means it is static at runtime (rather than dynamically determined at > each time of allocation, which could be bypassed by repeatedly spraying > in brute force). In other words, the randomness of cache selection will > be with respect to the code address rather than time, i.e. allocations > in different code paths would most likely pick different caches, > although kmalloc() at each place would use the same cache copy whenever > it is executed. In this way, the vulnerable object and memory allocated > in other subsystems and modules will (most probably) be on different > slab caches, which prevents the object from being sprayed. > > Meanwhile, the static random selection is further enhanced with a > per-boot random seed, which prevents the attacker from finding a usable > kmalloc that happens to pick the same cache with the vulnerable > subsystem/module by analyzing the open source code. > > The overhead of performance has been tested on a 40-core x86 server by > comparing the results of `perf bench all` between the kernels with and > without this patch based on the latest linux-next kernel, which shows > minor difference. A subset of benchmarks are listed below: > > sched/ sched/ syscall/ mem/ mem/ > messaging pipe basic memcpy memset > (sec) (sec) (sec) (GB/sec) (GB/sec) > > control1 0.019 5.459 0.733 15.258789 51.398026 > control2 0.019 5.439 0.730 16.009221 48.828125 > control3 0.019 5.282 0.735 16.009221 48.828125 > control_avg 0.019 5.393 0.733 15.759077 49.684759 > > experiment1 0.019 5.374 0.741 15.500992 46.502976 > experiment2 0.019 5.440 0.746 16.276042 51.398026 > experiment3 0.019 5.242 0.752 15.258789 51.398026 > experiment_avg 0.019 5.352 0.746 15.678608 49.766343 > > The overhead of memory usage was measured by executing `free` after boot > on a QEMU VM with 1GB total memory, and as expected, it's positively > correlated with # of cache copies: > > control 4 copies 8 copies 16 copies > > total 969.8M 968.2M 968.2M 968.2M > used 20.0M 21.9M 24.1M 26.7M > free 936.9M 933.6M 931.4M 928.6M > available 932.2M 928.8M 926.6M 923.9M > > Signed-off-by: GONG, Ruiqi <gongruiqi@xxxxxxxxxxxxxxx> > Co-developed-by: Xiu Jianfeng <xiujianfeng@xxxxxxxxxx> > Signed-off-by: Xiu Jianfeng <xiujianfeng@xxxxxxxxxx> I think this looks really good. Thanks for the respin! Some nits/comments/questions below, but I think this can land and get incrementally improved. Please consider it: Reviewed-by: Kees Cook <keescook@xxxxxxxxxxxx> > diff --git a/include/linux/slab.h b/include/linux/slab.h > index 791f7453a04f..b7a5387f0dad 100644 > --- a/include/linux/slab.h > +++ b/include/linux/slab.h > @@ -19,6 +19,9 @@ > #include <linux/workqueue.h> > #include <linux/percpu-refcount.h> > > +#ifdef CONFIG_RANDOM_KMALLOC_CACHES > +#include <linux/hash.h> > +#endif I think this can just be included unconditionally, yes? > [...] > +extern unsigned long random_kmalloc_seed; > + > +static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags, unsigned long caller) > { > /* > * The most common case is KMALLOC_NORMAL, so test for it > * with a single branch for all the relevant flags. > */ > if (likely((flags & KMALLOC_NOT_NORMAL_BITS) == 0)) > +#ifdef CONFIG_RANDOM_KMALLOC_CACHES > + return KMALLOC_RANDOM_START + hash_64(caller ^ random_kmalloc_seed, > + CONFIG_RANDOM_KMALLOC_CACHES_BITS); > +#else > return KMALLOC_NORMAL; > +#endif The commit log talks about having no runtime lookup, but that's not entirely true, given this routine. And xor and a hash_64... I wonder how expensive this is compared to some kind of constant expression that could be computed at build time... (the xor should stay, but that's "cheap"). > > /* > * At least one of the flags has to be set. Their priorities in > @@ -577,7 +589,7 @@ static __always_inline __alloc_size(1) void *kmalloc(size_t size, gfp_t flags) > > index = kmalloc_index(size); > return kmalloc_trace( > - kmalloc_caches[kmalloc_type(flags)][index], > + kmalloc_caches[kmalloc_type(flags, _RET_IP_)][index], > flags, size); > } > return __kmalloc(size, flags); > @@ -593,7 +605,7 @@ static __always_inline __alloc_size(1) void *kmalloc_node(size_t size, gfp_t fla > > index = kmalloc_index(size); > return kmalloc_node_trace( > - kmalloc_caches[kmalloc_type(flags)][index], > + kmalloc_caches[kmalloc_type(flags, _RET_IP_)][index], > flags, node, size); > } > return __kmalloc_node(size, flags, node); The use of _RET_IP_ is generally fine here, but I wonder about some of the allocation wrappers (like devm_kmalloc(), etc). I think those aren't being bucketed correctly? Have you checked that? > [...] > @@ -776,12 +781,44 @@ EXPORT_SYMBOL(kmalloc_size_roundup); > #define KMALLOC_RCL_NAME(sz) > #endif > > +#ifdef CONFIG_RANDOM_KMALLOC_CACHES > +#define __KMALLOC_RANDOM_CONCAT(a, b) a ## b > +#define KMALLOC_RANDOM_NAME(N, sz) __KMALLOC_RANDOM_CONCAT(KMA_RAND_, N)(sz) > +#if CONFIG_RANDOM_KMALLOC_CACHES_BITS >= 1 > +#define KMA_RAND_1(sz) .name[KMALLOC_RANDOM_START + 0] = "kmalloc-random-01-" #sz, I wonder if this name is getting too long? Should "random" be "rnd" ? *shrug* > [...] > +#define KMA_RAND_16(sz) KMA_RAND_15(sz) .name[KMALLOC_RANDOM_START + 15] = "kmalloc-random-16-" #sz, And if we wanted to save another character, this could be numbered 0-f, but I defer these aesthetics to Vlastimil. :) -Kees -- Kees Cook