Re: [PATCH v5 4/6] mm/slab: Introduce kmem_buckets_create() and family

Vlastimil Babka <vbabka@xxxxxxx> · Thu, 20 Jun 2024 15:56:27 +0200

On 6/19/24 9:33 PM, Kees Cook wrote:
> Dedicated caches are available for fixed size allocations via
> kmem_cache_alloc(), but for dynamically sized allocations there is only
> the global kmalloc API's set of buckets available. This means it isn't
> possible to separate specific sets of dynamically sized allocations into
> a separate collection of caches.
> 
> This leads to a use-after-free exploitation weakness in the Linux
> kernel since many heap memory spraying/grooming attacks depend on using
> userspace-controllable dynamically sized allocations to collide with
> fixed size allocations that end up in same cache.
> 
> While CONFIG_RANDOM_KMALLOC_CACHES provides a probabilistic defense
> against these kinds of "type confusion" attacks, including for fixed
> same-size heap objects, we can create a complementary deterministic
> defense for dynamically sized allocations that are directly user
> controlled. Addressing these cases is limited in scope, so isolating these
> kinds of interfaces will not become an unbounded game of whack-a-mole. For
> example, many pass through memdup_user(), making isolation there very
> effective.
> 
> In order to isolate user-controllable dynamically-sized
> allocations from the common system kmalloc allocations, introduce
> kmem_buckets_create(), which behaves like kmem_cache_create(). Introduce
> kmem_buckets_alloc(), which behaves like kmem_cache_alloc(). Introduce
> kmem_buckets_alloc_track_caller() for where caller tracking is
> needed. Introduce kmem_buckets_valloc() for cases where vmalloc fallback
> is needed.
> 
> This can also be used in the future to extend allocation profiling's use
> of code tagging to implement per-caller allocation cache isolation[1]
> even for dynamic allocations.
> 
> Memory allocation pinning[2] is still needed to plug the Use-After-Free
> cross-allocator weakness, but that is an existing and separate issue
> which is complementary to this improvement. Development continues for
> that feature via the SLAB_VIRTUAL[3] series (which could also provide
> guard pages -- another complementary improvement).
> 
> Link: https://lore.kernel.org/lkml/202402211449.401382D2AF@keescook [1]
> Link: https://googleprojectzero.blogspot.com/2021/10/how-simple-linux-kernel-memory.html [2]
> Link: https://lore.kernel.org/lkml/20230915105933.495735-1-matteorizzo@xxxxxxxxxx/ [3]
> Signed-off-by: Kees Cook <kees@xxxxxxxxxx>
> ---
>  include/linux/slab.h | 13 ++++++++
>  mm/slab_common.c     | 78 ++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 91 insertions(+)
> 
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 8d0800c7579a..3698b15b6138 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -549,6 +549,11 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
>  
>  void kmem_cache_free(struct kmem_cache *s, void *objp);
>  
> +kmem_buckets *kmem_buckets_create(const char *name, unsigned int align,
> +				  slab_flags_t flags,
> +				  unsigned int useroffset, unsigned int usersize,
> +				  void (*ctor)(void *));

I'd drop the ctor, I can't imagine how it would be used with variable-sized
allocations. Probably also "align" doesn't make much sense since we're just
copying the kmalloc cache sizes and its implicit alignment of any
power-of-two allocations. I don't think any current kmalloc user would
suddenly need either of those as you convert it to buckets, and definitely
not any user converted automatically by the code tagging.