On 11/26/24 15:53, Ryan Roberts wrote: > On 26/11/2024 12:36, Vlastimil Babka wrote: >> On 11/26/24 13:18, Ryan Roberts wrote: >>> On 14/11/2024 10:09, Vlastimil Babka wrote: >>>> On 11/1/24 21:16, Dave Kleikamp wrote: >>>>> When boot-time page size is enabled, the test against KMALLOC_MAX_CACHE_SIZE >>>>> is no longer optimized out with a constant size, so a build bug may >>>>> occur on a path that won't be reached. >>>> >>>> That's rather unfortunate, the __builtin_constant_p(size) part of >>>> kmalloc_noprof() really expects things to resolve at compile time and it >>>> would be better to keep it that way. >>>> >>>> I think it would be better if we based KMALLOC_MAX_CACHE_SIZE itself on >>>> PAGE_SHIFT_MAX and kept it constant, instead of introducing >>>> KMALLOC_SHIFT_HIGH_MAX only for some sanity checks. >>>> >>>> So if the kernel was built to support 4k to 64k, but booted as 4k, it would >>>> still create and use kmalloc caches up to 128k. SLUB should handle that fine >>>> (if not, please report it :) >>> >>> So when PAGE_SIZE_MAX=64K and PAGE_SIZE=4K, kmalloc will support up to 128K >>> whereas before it only supported up to 8K. I was trying to avoid that since I >>> assumed that would be costly in terms of extra memory allocated for those higher >>> order buckets that will never be used. But I have no idea how SLUB works in >>> practice. Perhaps memory for the cache is only lazily allocated so we won't see >>> an issue in practice? >> >> Yes the e.g. 128k slabs themselves will be lazily allocated. There will be >> some overhead with the management structures (struct kmem_cache etc) but >> much smaller. >> To be completely honest, some extra overhead might come to be when the slabs >> are allocated ans later the user frees those allocations. kmalloc_large() >> wwould return them immediately, while a regular kmem_cache will keep one or >> more per cpu for reuse. But if that becomes a visible problem we can tune >> those caches to discard slabs more aggressively. > > Sorry to keep pushing on this, now that I've actually looked at the code, I feel > I have a slightly better understanding: > > void *kmalloc_noprof(size_t size, gfp_t flags) > { > if (__builtin_constant_p(size) && size) { > > if (size > KMALLOC_MAX_CACHE_SIZE) > return __kmalloc_large_noprof(size, flags); <<< (1) > > index = kmalloc_index(size); > return __kmalloc_cache_noprof(...); <<< (2) > } > return __kmalloc_noprof(size, flags); <<< (3) > } > > So if size and KMALLOC_MAX_CACHE_SIZE are constant, we end up with this > resolving either to a call to (1) or (2), decided at compile time. If > KMALLOC_MAX_CACHE_SIZE is not constant, (1), (2) and the runtime conditional > need to be kept in the function. > > But intuatively, I would have guessed that given the choice between the overhead > of keeping that runtime conditional vs keeping per-cpu slab caches for extra > sizes between 16K and 128K, then the runtime conditional would be preferable. I > would guess that quite a bit of memory could get tied up in those caches? > > Why is your preference the opposite? What am I not understanding? +CC more slab people. So the above is an inline function, but constructed in a way that it should, without further inline code, become - a call to __kmalloc_large_noprof() for build-time constant size larger than KMALLOC_MAX_CACHE_SIZE - a call to __kmalloc_cache_noprof() for build-time constant size smaller than KMALLOC_MAX_CACHE_SIZE, where the cache is picked from an array with compile-time calculated index - call to __kmalloc_noprof() for non-constant sizes otherwise If KMALLOC_MAX_CACHE_SIZE stops being build-time constant, the sensible way to handle it would be to #ifdef or otherwise compile out away the whole "if __builtin_constant_p(size)" part and just call __kmalloc_noprof() always, so we don't blow the inline paths with a KMALLOC_MAX_CACHE_SIZE check leading to choice between calling __kmalloc_large_noprof() or __kmalloc_cache_noprof(). I just don't believe we would waste so much memory with caches the extra sizes for sizes between 16K and 128K, so would do that suggestion only if proven wrong. But I wouldn't mind it that much if you chose it right away. The solution earlier in this thread to patch __kmalloc_index() would be worse than either of those two alternatives though. > >> >>> I'm happy to make this change if you're certain it's the right approach; please >>> confirm. >> >> Yes it's much better option than breaking the build-time-constant part of >> kmalloc_noprof(). >> >>>> >>>> Maybe we could also stop adding + 1 to PAGE_SHIFT_MAX if it's >=64k, so the >>>> cache size is max 64k and not 128k but that should be probably evaluated >>>> separately from this series. >>> >>> I'm inferring from this that perhaps there is a memory cost with having the >>> higher orders defined but unused. >> >> Yeah as per above, should not be too large and we could tune it down if >> necessary. >> >>> Thanks, >>> Ryan >>> >>>> >>>> Vlastimil >>>> >>>>> Found compiling drivers/net/ethernet/qlogic/qed/qed_sriov.c >>>>> >>>>> Signed-off-by: Dave Kleikamp <dave.kleikamp@xxxxxxxxxx> >>>>> --- >>>>> >>>>> Ryan, >>>>> >>>>> Please consider incorporating this fix or something similar into your >>>>> mm patch in the boot-time pages size patches. >>>>> >>>>> include/linux/slab.h | 3 ++- >>>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/include/linux/slab.h b/include/linux/slab.h >>>>> index 9848296ca6ba..a4c7507ab8ec 100644 >>>>> --- a/include/linux/slab.h >>>>> +++ b/include/linux/slab.h >>>>> @@ -685,7 +685,8 @@ static __always_inline unsigned int __kmalloc_index(size_t size, >>>>> if (size <= 1024 * 1024) return 20; >>>>> if (size <= 2 * 1024 * 1024) return 21; >>>>> >>>>> - if (!IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant) >>>>> + if (!IS_ENABLED(CONFIG_ARM64_BOOT_TIME_PAGE_SIZE) && >>>>> + !IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant) >>>>> BUILD_BUG_ON_MSG(1, "unexpected size in kmalloc_index()"); >>>>> else >>>>> BUG(); >>>> >>> >> >