On 26/11/2024 15:27, Vlastimil Babka wrote: > On 11/26/24 16:09, Vlastimil Babka wrote: >> On 11/26/24 15:53, Ryan Roberts wrote: >>> On 26/11/2024 12:36, Vlastimil Babka wrote: >>>> On 11/26/24 13:18, Ryan Roberts wrote: >>>>> On 14/11/2024 10:09, Vlastimil Babka wrote: >>>>>> On 11/1/24 21:16, Dave Kleikamp wrote: >>>>>>> When boot-time page size is enabled, the test against KMALLOC_MAX_CACHE_SIZE >>>>>>> is no longer optimized out with a constant size, so a build bug may >>>>>>> occur on a path that won't be reached. >>>>>> >>>>>> That's rather unfortunate, the __builtin_constant_p(size) part of >>>>>> kmalloc_noprof() really expects things to resolve at compile time and it >>>>>> would be better to keep it that way. >>>>>> >>>>>> I think it would be better if we based KMALLOC_MAX_CACHE_SIZE itself on >>>>>> PAGE_SHIFT_MAX and kept it constant, instead of introducing >>>>>> KMALLOC_SHIFT_HIGH_MAX only for some sanity checks. >>>>>> >>>>>> So if the kernel was built to support 4k to 64k, but booted as 4k, it would >>>>>> still create and use kmalloc caches up to 128k. SLUB should handle that fine >>>>>> (if not, please report it :) >>>>> >>>>> So when PAGE_SIZE_MAX=64K and PAGE_SIZE=4K, kmalloc will support up to 128K >>>>> whereas before it only supported up to 8K. I was trying to avoid that since I >>>>> assumed that would be costly in terms of extra memory allocated for those higher >>>>> order buckets that will never be used. But I have no idea how SLUB works in >>>>> practice. Perhaps memory for the cache is only lazily allocated so we won't see >>>>> an issue in practice? >>>> >>>> Yes the e.g. 128k slabs themselves will be lazily allocated. There will be >>>> some overhead with the management structures (struct kmem_cache etc) but >>>> much smaller. >>>> To be completely honest, some extra overhead might come to be when the slabs >>>> are allocated ans later the user frees those allocations. kmalloc_large() >>>> wwould return them immediately, while a regular kmem_cache will keep one or >>>> more per cpu for reuse. But if that becomes a visible problem we can tune >>>> those caches to discard slabs more aggressively. >>> >>> Sorry to keep pushing on this, now that I've actually looked at the code, I feel >>> I have a slightly better understanding: >>> >>> void *kmalloc_noprof(size_t size, gfp_t flags) >>> { >>> if (__builtin_constant_p(size) && size) { >>> >>> if (size > KMALLOC_MAX_CACHE_SIZE) >>> return __kmalloc_large_noprof(size, flags); <<< (1) >>> >>> index = kmalloc_index(size); >>> return __kmalloc_cache_noprof(...); <<< (2) >>> } >>> return __kmalloc_noprof(size, flags); <<< (3) >>> } >>> >>> So if size and KMALLOC_MAX_CACHE_SIZE are constant, we end up with this >>> resolving either to a call to (1) or (2), decided at compile time. If >>> KMALLOC_MAX_CACHE_SIZE is not constant, (1), (2) and the runtime conditional >>> need to be kept in the function. >>> >>> But intuatively, I would have guessed that given the choice between the overhead >>> of keeping that runtime conditional vs keeping per-cpu slab caches for extra >>> sizes between 16K and 128K, then the runtime conditional would be preferable. I >>> would guess that quite a bit of memory could get tied up in those caches? >>> >>> Why is your preference the opposite? What am I not understanding? >> >> +CC more slab people. >> >> So the above is an inline function, but constructed in a way that it should, >> without further inline code, become >> - a call to __kmalloc_large_noprof() for build-time constant size larger >> than KMALLOC_MAX_CACHE_SIZE >> - a call to __kmalloc_cache_noprof() for build-time constant size smaller >> than KMALLOC_MAX_CACHE_SIZE, where the cache is picked from an array with >> compile-time calculated index >> - call to __kmalloc_noprof() for non-constant sizes otherwise >> >> If KMALLOC_MAX_CACHE_SIZE stops being build-time constant, the sensible way >> to handle it would be to #ifdef or otherwise compile out away the whole "if >> __builtin_constant_p(size)" part and just call __kmalloc_noprof() always, so >> we don't blow the inline paths with a KMALLOC_MAX_CACHE_SIZE check leading >> to choice between calling __kmalloc_large_noprof() or __kmalloc_cache_noprof(). > > Or maybe we could have PAGE_SIZE_MAX derived KMALLOC_MAX_CACHE_SIZE_MAX > behave as the code above currently does with KMALLOC_MAX_CACHE_SIZE, and > additionally have PAGE_SIZE_MIN derived KMALLOC_MAX_CACHE_SIZE_MIN, where > build-time-constant size larger than KMALLOC_MAX_CACHE_SIZE_MIN (which is a > compile-time test) is redirected to __kmalloc_noprof() for a run-time test. > > That seems like the optimum solution :) Yes; that feels like the better approach to me. I'll implement this by default unless anyone else objects. > >> I just don't believe we would waste so much memory with caches the extra >> sizes for sizes between 16K and 128K, so would do that suggestion only if >> proven wrong. But I wouldn't mind it that much if you chose it right away. >> The solution earlier in this thread to patch __kmalloc_index() would be >> worse than either of those two alternatives though. > >