RE: Deprecating and removing SLOB

David Laight <David.Laight@xxxxxxxxxx> · Fri, 11 Nov 2022 09:37:07 +0000

From: Matthew Wilcox
> Sent: 10 November 2022 16:20
> 
> On Thu, Nov 10, 2022 at 08:31:31AM +0100, Vlastimil Babka wrote:
> > >     octeon-hcd will crash the kernel when SLOB is used. This usually happens
> > >     after the 18-byte control transfer when a device descriptor is read.
> > >     The DMA engine is always transfering full 32-bit words and if the
> > >     transfer is shorter, some random garbage appears after the buffer.
> > >     The problem is not visible with SLUB since it rounds up the allocations
> > >     to word boundary, and the extra bytes will go undetected.
> >
> > Ah, actually it wouldn't *now* as SLUB would make the allocation fall into
> > kmalloc-32 cache and only add redzone beyond 32 bytes. But with upcoming
> > changes by Feng Tang, this should work.
> 
> This is kind of "if a bug stings a tree in a forest, does it hurt"
> problem.  If all allocations of 18 bytes are rounded up to 20 or more
> bytes, then it doesn't matter that the device has this bug.  Sure, it
> may end up hurting in the future if we decide to create 18-byte slab
> caches, but it's not actually going to affect anything today (and we
> seem to be moving towards less precision in order to get more
> performance)

Yes, even on dma-coherent systems allocated blocks have to be
moderately aligned - so the space after an 18 byte block can't be used.
I also doubt there is any benefit (and many bugs) from allowing
2 bytes alignment on m68k.
So the 'overwrite to a whole number of words' maybe reasonably expected
to not cause any real bugs.

x86 (even 32bit) probably requires 16 byte alignment (for some corner
cases) - ok for a power-of-2 allocator that doesn't add a header.
(Although 1, 2, 4 and 8 byte allocates are valid.)

To reduce memory wastage what you really don't want is an allocator
that adds a header/trailer and then rounds up to a power of 2.
Coders write in binary and do kmalloc(256) not kmalloc(200) and
rounding 256 up to 512 is rather wasteful.
(Search for the kmalloc(PAGE_SIZE+1) :-)

I also think that one of the allocators only cuts pages into
power-of-2 sizes.
It is probably sensible to return cache-aligned (probably 64 byte)
buffers for requests larger than a cache line.
But a 4k page can be split into 21 192-byte buffers.
As well as using less memory for allocates between 129 and 192 bytes
it may reduce pressure on the d-cache by evening out cache line usage.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)