On Wed, Jul 3, 2024 at 9:25 AM Vlastimil Babka <vbabka@xxxxxxx> wrote: > > Slab allocators have been guaranteeing natural alignment for > power-of-two sizes since commit 59bb47985c1d ("mm, sl[aou]b: guarantee > natural alignment for kmalloc(power-of-two)"), while any other sizes are > guaranteed to be aligned only to ARCH_KMALLOC_MINALIGN bytes (although > in practice are aligned more than that in non-debug scenarios). > > Rust's allocator API specifies size and alignment per allocation, which > have to satisfy the following rules, per Alice Ryhl [1]: > > 1. The alignment is a power of two. > 2. The size is non-zero. > 3. When you round up the size to the next multiple of the alignment, > then it must not overflow the signed type isize / ssize_t. > > In order to map this to kmalloc()'s guarantees, some requested > allocation sizes have to be padded to the next power-of-two size [2]. > For example, an allocation of size 96 and alignment of 32 will be padded > to an allocation of size 128, because the existing kmalloc-96 bucket > doesn't guarantee alignent above ARCH_KMALLOC_MINALIGN. Without slab > debugging active, the layout of the kmalloc-96 slabs however naturally > align the objects to 32 bytes, so extending the size to 128 bytes is > wasteful. > > To improve the situation we can extend the kmalloc() alignment > guarantees in a way that > > 1) doesn't change the current slab layout (and thus does not increase > internal fragmentation) when slab debugging is not active > 2) reduces waste in the Rust allocator use case > 3) is a superset of the current guarantee for power-of-two sizes. > > The extended guarantee is that alignment is at least the largest > power-of-two divisor of the requested size. For power-of-two sizes the > largest divisor is the size itself, but let's keep this case documented > separately for clarity. > > For current kmalloc size buckets, it means kmalloc-96 will guarantee > alignment of 32 bytes and kmalloc-196 will guarantee 64 bytes. > > This covers the rules 1 and 2 above of Rust's API as long as the size is > a multiple of the alignment. The Rust layer should now only need to > round up the size to the next multiple if it isn't, while enforcing the > rule 3. > > Implementation-wise, this changes the alignment calculation in > create_boot_cache(). While at it also do the calulation only for caches > with the SLAB_KMALLOC flag, because the function is also used to create > the initial kmem_cache and kmem_cache_node caches, where no alignment > guarantee is necessary. > > In the Rust allocator's krealloc_aligned(), remove the code that padded > sizes to the next power of two (suggested by Alice Ryhl) as it's no > longer necessary with the new guarantees. > > Reported-by: Alice Ryhl <aliceryhl@xxxxxxxxxx> > Reported-by: Boqun Feng <boqun.feng@xxxxxxxxx> > Link: https://lore.kernel.org/all/CAH5fLggjrbdUuT-H-5vbQfMazjRDpp2%2Bk3%3DYhPyS17ezEqxwcw@xxxxxxxxxxxxxx/ [1] > Link: https://lore.kernel.org/all/CAH5fLghsZRemYUwVvhk77o6y1foqnCeDzW4WZv6ScEWna2+_jw@xxxxxxxxxxxxxx/ [2] > Signed-off-by: Vlastimil Babka <vbabka@xxxxxxx> > Reviewed-by: Boqun Feng <boqun.feng@xxxxxxxxx> > Acked-by: Roman Gushchin <roman.gushchin@xxxxxxxxx> Reviewed-by: Alice Ryhl <aliceryhl@xxxxxxxxxx> > --- > v2: - add Rust side change as suggested by Alice, also thanks Boqun for fixups > - clarify that the alignment already existed (unless debugging) but was > not guaranteed, so there's no extra fragmentation in slab > - add r-b, a-b thanks tO Boqun and Roman > > If it's fine with Rust folks, I can put this in the slab.git tree. > > Documentation/core-api/memory-allocation.rst | 6 ++++-- > include/linux/slab.h | 3 ++- > mm/slab_common.c | 9 +++++---- > rust/kernel/alloc/allocator.rs | 19 ++++++------------- > 4 files changed, 17 insertions(+), 20 deletions(-) > > diff --git a/Documentation/core-api/memory-allocation.rst b/Documentation/core-api/memory-allocation.rst > index 1c58d883b273..8b84eb4bdae7 100644 > --- a/Documentation/core-api/memory-allocation.rst > +++ b/Documentation/core-api/memory-allocation.rst > @@ -144,8 +144,10 @@ configuration, but it is a good practice to use `kmalloc` for objects > smaller than page size. > > The address of a chunk allocated with `kmalloc` is aligned to at least > -ARCH_KMALLOC_MINALIGN bytes. For sizes which are a power of two, the > -alignment is also guaranteed to be at least the respective size. > +ARCH_KMALLOC_MINALIGN bytes. For sizes which are a power of two, the > +alignment is also guaranteed to be at least the respective size. For other > +sizes, the alignment is guaranteed to be at least the largest power-of-two > +divisor of the size. > > Chunks allocated with kmalloc() can be resized with krealloc(). Similarly > to kmalloc_array(): a helper for resizing arrays is provided in the form of > diff --git a/include/linux/slab.h b/include/linux/slab.h > index ed6bee5ec2b6..640cea6e6323 100644 > --- a/include/linux/slab.h > +++ b/include/linux/slab.h > @@ -604,7 +604,8 @@ void *__kmalloc_large_node_noprof(size_t size, gfp_t flags, int node) > * > * The allocated object address is aligned to at least ARCH_KMALLOC_MINALIGN > * bytes. For @size of power of two bytes, the alignment is also guaranteed > - * to be at least to the size. > + * to be at least to the size. For other sizes, the alignment is guaranteed to > + * be at least the largest power-of-two divisor of @size. > * > * The @flags argument may be one of the GFP flags defined at > * include/linux/gfp_types.h and described at > diff --git a/mm/slab_common.c b/mm/slab_common.c > index 1560a1546bb1..7272ef7bc55f 100644 > --- a/mm/slab_common.c > +++ b/mm/slab_common.c > @@ -617,11 +617,12 @@ void __init create_boot_cache(struct kmem_cache *s, const char *name, > s->size = s->object_size = size; > > /* > - * For power of two sizes, guarantee natural alignment for kmalloc > - * caches, regardless of SL*B debugging options. > + * kmalloc caches guarantee alignment of at least the largest > + * power-of-two divisor of the size. For power-of-two sizes, > + * it is the size itself. > */ > - if (is_power_of_2(size)) > - align = max(align, size); > + if (flags & SLAB_KMALLOC) > + align = max(align, 1U << (ffs(size) - 1)); > s->align = calculate_alignment(flags, align, size); > > #ifdef CONFIG_HARDENED_USERCOPY > diff --git a/rust/kernel/alloc/allocator.rs b/rust/kernel/alloc/allocator.rs > index 229642960cd1..e6ea601f38c6 100644 > --- a/rust/kernel/alloc/allocator.rs > +++ b/rust/kernel/alloc/allocator.rs > @@ -18,23 +18,16 @@ pub(crate) unsafe fn krealloc_aligned(ptr: *mut u8, new_layout: Layout, flags: F > // Customized layouts from `Layout::from_size_align()` can have size < align, so pad first. > let layout = new_layout.pad_to_align(); > > - let mut size = layout.size(); > - > - if layout.align() > bindings::ARCH_SLAB_MINALIGN { > - // The alignment requirement exceeds the slab guarantee, thus try to enlarge the size > - // to use the "power-of-two" size/alignment guarantee (see comments in `kmalloc()` for > - // more information). > - // > - // Note that `layout.size()` (after padding) is guaranteed to be a multiple of > - // `layout.align()`, so `next_power_of_two` gives enough alignment guarantee. > - size = size.next_power_of_two(); > - } > + // Note that `layout.size()` (after padding) is guaranteed to be a multiple of `layout.align()` > + // which together with the slab guarantees means the `krealloc` will return a properly aligned > + // object (see comments in `kmalloc()` for more information). > + let size = layout.size(); The size is a multiple of the alignment due to the `pad_to_align` call above, which rounds up the size to ensure that this is the case. Alice