On 6/19/24 9:33 PM, Kees Cook wrote: > Dedicated caches are available for fixed size allocations via > kmem_cache_alloc(), but for dynamically sized allocations there is only > the global kmalloc API's set of buckets available. This means it isn't > possible to separate specific sets of dynamically sized allocations into > a separate collection of caches. > > This leads to a use-after-free exploitation weakness in the Linux > kernel since many heap memory spraying/grooming attacks depend on using > userspace-controllable dynamically sized allocations to collide with > fixed size allocations that end up in same cache. > > While CONFIG_RANDOM_KMALLOC_CACHES provides a probabilistic defense > against these kinds of "type confusion" attacks, including for fixed > same-size heap objects, we can create a complementary deterministic > defense for dynamically sized allocations that are directly user > controlled. Addressing these cases is limited in scope, so isolating these > kinds of interfaces will not become an unbounded game of whack-a-mole. For > example, many pass through memdup_user(), making isolation there very > effective. > > In order to isolate user-controllable dynamically-sized > allocations from the common system kmalloc allocations, introduce > kmem_buckets_create(), which behaves like kmem_cache_create(). Introduce > kmem_buckets_alloc(), which behaves like kmem_cache_alloc(). Introduce > kmem_buckets_alloc_track_caller() for where caller tracking is > needed. Introduce kmem_buckets_valloc() for cases where vmalloc fallback > is needed. > > This can also be used in the future to extend allocation profiling's use > of code tagging to implement per-caller allocation cache isolation[1] > even for dynamic allocations. > > Memory allocation pinning[2] is still needed to plug the Use-After-Free > cross-allocator weakness, but that is an existing and separate issue > which is complementary to this improvement. Development continues for > that feature via the SLAB_VIRTUAL[3] series (which could also provide > guard pages -- another complementary improvement). > > Link: https://lore.kernel.org/lkml/202402211449.401382D2AF@keescook [1] > Link: https://googleprojectzero.blogspot.com/2021/10/how-simple-linux-kernel-memory.html [2] > Link: https://lore.kernel.org/lkml/20230915105933.495735-1-matteorizzo@xxxxxxxxxx/ [3] > Signed-off-by: Kees Cook <kees@xxxxxxxxxx> > --- > include/linux/slab.h | 13 ++++++++ > mm/slab_common.c | 78 ++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 91 insertions(+) > > diff --git a/include/linux/slab.h b/include/linux/slab.h > index 8d0800c7579a..3698b15b6138 100644 > --- a/include/linux/slab.h > +++ b/include/linux/slab.h > @@ -549,6 +549,11 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru, > > void kmem_cache_free(struct kmem_cache *s, void *objp); > > +kmem_buckets *kmem_buckets_create(const char *name, unsigned int align, > + slab_flags_t flags, > + unsigned int useroffset, unsigned int usersize, > + void (*ctor)(void *)); I'd drop the ctor, I can't imagine how it would be used with variable-sized allocations. Probably also "align" doesn't make much sense since we're just copying the kmalloc cache sizes and its implicit alignment of any power-of-two allocations. I don't think any current kmalloc user would suddenly need either of those as you convert it to buckets, and definitely not any user converted automatically by the code tagging.