On 11/26/24 16:09, Vlastimil Babka wrote: > On 11/26/24 15:53, Ryan Roberts wrote: >> On 26/11/2024 12:36, Vlastimil Babka wrote: >>> On 11/26/24 13:18, Ryan Roberts wrote: >>>> On 14/11/2024 10:09, Vlastimil Babka wrote: >>>>> On 11/1/24 21:16, Dave Kleikamp wrote: >>>>>> When boot-time page size is enabled, the test against KMALLOC_MAX_CACHE_SIZE >>>>>> is no longer optimized out with a constant size, so a build bug may >>>>>> occur on a path that won't be reached. >>>>> >>>>> That's rather unfortunate, the __builtin_constant_p(size) part of >>>>> kmalloc_noprof() really expects things to resolve at compile time and it >>>>> would be better to keep it that way. >>>>> >>>>> I think it would be better if we based KMALLOC_MAX_CACHE_SIZE itself on >>>>> PAGE_SHIFT_MAX and kept it constant, instead of introducing >>>>> KMALLOC_SHIFT_HIGH_MAX only for some sanity checks. >>>>> >>>>> So if the kernel was built to support 4k to 64k, but booted as 4k, it would >>>>> still create and use kmalloc caches up to 128k. SLUB should handle that fine >>>>> (if not, please report it :) >>>> >>>> So when PAGE_SIZE_MAX=64K and PAGE_SIZE=4K, kmalloc will support up to 128K >>>> whereas before it only supported up to 8K. I was trying to avoid that since I >>>> assumed that would be costly in terms of extra memory allocated for those higher >>>> order buckets that will never be used. But I have no idea how SLUB works in >>>> practice. Perhaps memory for the cache is only lazily allocated so we won't see >>>> an issue in practice? >>> >>> Yes the e.g. 128k slabs themselves will be lazily allocated. There will be >>> some overhead with the management structures (struct kmem_cache etc) but >>> much smaller. >>> To be completely honest, some extra overhead might come to be when the slabs >>> are allocated ans later the user frees those allocations. kmalloc_large() >>> wwould return them immediately, while a regular kmem_cache will keep one or >>> more per cpu for reuse. But if that becomes a visible problem we can tune >>> those caches to discard slabs more aggressively. >> >> Sorry to keep pushing on this, now that I've actually looked at the code, I feel >> I have a slightly better understanding: >> >> void *kmalloc_noprof(size_t size, gfp_t flags) >> { >> if (__builtin_constant_p(size) && size) { >> >> if (size > KMALLOC_MAX_CACHE_SIZE) >> return __kmalloc_large_noprof(size, flags); <<< (1) >> >> index = kmalloc_index(size); >> return __kmalloc_cache_noprof(...); <<< (2) >> } >> return __kmalloc_noprof(size, flags); <<< (3) >> } >> >> So if size and KMALLOC_MAX_CACHE_SIZE are constant, we end up with this >> resolving either to a call to (1) or (2), decided at compile time. If >> KMALLOC_MAX_CACHE_SIZE is not constant, (1), (2) and the runtime conditional >> need to be kept in the function. >> >> But intuatively, I would have guessed that given the choice between the overhead >> of keeping that runtime conditional vs keeping per-cpu slab caches for extra >> sizes between 16K and 128K, then the runtime conditional would be preferable. I >> would guess that quite a bit of memory could get tied up in those caches? >> >> Why is your preference the opposite? What am I not understanding? > > +CC more slab people. > > So the above is an inline function, but constructed in a way that it should, > without further inline code, become > - a call to __kmalloc_large_noprof() for build-time constant size larger > than KMALLOC_MAX_CACHE_SIZE > - a call to __kmalloc_cache_noprof() for build-time constant size smaller > than KMALLOC_MAX_CACHE_SIZE, where the cache is picked from an array with > compile-time calculated index > - call to __kmalloc_noprof() for non-constant sizes otherwise > > If KMALLOC_MAX_CACHE_SIZE stops being build-time constant, the sensible way > to handle it would be to #ifdef or otherwise compile out away the whole "if > __builtin_constant_p(size)" part and just call __kmalloc_noprof() always, so > we don't blow the inline paths with a KMALLOC_MAX_CACHE_SIZE check leading > to choice between calling __kmalloc_large_noprof() or __kmalloc_cache_noprof(). Or maybe we could have PAGE_SIZE_MAX derived KMALLOC_MAX_CACHE_SIZE_MAX behave as the code above currently does with KMALLOC_MAX_CACHE_SIZE, and additionally have PAGE_SIZE_MIN derived KMALLOC_MAX_CACHE_SIZE_MIN, where build-time-constant size larger than KMALLOC_MAX_CACHE_SIZE_MIN (which is a compile-time test) is redirected to __kmalloc_noprof() for a run-time test. That seems like the optimum solution :) > I just don't believe we would waste so much memory with caches the extra > sizes for sizes between 16K and 128K, so would do that suggestion only if > proven wrong. But I wouldn't mind it that much if you chose it right away. > The solution earlier in this thread to patch __kmalloc_index() would be > worse than either of those two alternatives though.