Re: [RFC PATCH] mm/slab: Avoid build bug for calls to kmalloc with a large constant

Vlastimil Babka <vbabka@xxxxxxx> · Tue, 26 Nov 2024 16:09:24 +0100

On 11/26/24 15:53, Ryan Roberts wrote:
> On 26/11/2024 12:36, Vlastimil Babka wrote:
>> On 11/26/24 13:18, Ryan Roberts wrote:
>>> On 14/11/2024 10:09, Vlastimil Babka wrote:
>>>> On 11/1/24 21:16, Dave Kleikamp wrote:
>>>>> When boot-time page size is enabled, the test against KMALLOC_MAX_CACHE_SIZE
>>>>> is no longer optimized out with a constant size, so a build bug may
>>>>> occur on a path that won't be reached.
>>>>
>>>> That's rather unfortunate, the __builtin_constant_p(size) part of
>>>> kmalloc_noprof() really expects things to resolve at compile time and it
>>>> would be better to keep it that way.
>>>>
>>>> I think it would be better if we based KMALLOC_MAX_CACHE_SIZE itself on
>>>> PAGE_SHIFT_MAX and kept it constant, instead of introducing
>>>> KMALLOC_SHIFT_HIGH_MAX only for some sanity checks.
>>>>
>>>> So if the kernel was built to support 4k to 64k, but booted as 4k, it would
>>>> still create and use kmalloc caches up to 128k. SLUB should handle that fine
>>>> (if not, please report it :)
>>>
>>> So when PAGE_SIZE_MAX=64K and PAGE_SIZE=4K, kmalloc will support up to 128K
>>> whereas before it only supported up to 8K. I was trying to avoid that since I
>>> assumed that would be costly in terms of extra memory allocated for those higher
>>> order buckets that will never be used. But I have no idea how SLUB works in
>>> practice. Perhaps memory for the cache is only lazily allocated so we won't see
>>> an issue in practice?
>> 
>> Yes the e.g. 128k slabs themselves will be lazily allocated. There will be
>> some overhead with the management structures (struct kmem_cache etc) but
>> much smaller.
>> To be completely honest, some extra overhead might come to be when the slabs
>> are allocated ans later the user frees those allocations. kmalloc_large()
>> wwould return them immediately, while a regular kmem_cache will keep one or
>> more per cpu for reuse. But if that becomes a visible problem we can tune
>> those caches to discard slabs more aggressively.
> 
> Sorry to keep pushing on this, now that I've actually looked at the code, I feel
> I have a slightly better understanding:
> 
> void *kmalloc_noprof(size_t size, gfp_t flags)
> {
> 	if (__builtin_constant_p(size) && size) {
> 		
> 		if (size > KMALLOC_MAX_CACHE_SIZE)
> 			return __kmalloc_large_noprof(size, flags); <<< (1)
> 
> 		index = kmalloc_index(size);
> 		return __kmalloc_cache_noprof(...);   <<< (2)
> 	}
> 	return __kmalloc_noprof(size, flags);   <<< (3)
> }
> 
> So if size and KMALLOC_MAX_CACHE_SIZE are constant, we end up with this
> resolving either to a call to (1) or (2), decided at compile time. If
> KMALLOC_MAX_CACHE_SIZE is not constant, (1), (2) and the runtime conditional
> need to be kept in the function.
> 
> But intuatively, I would have guessed that given the choice between the overhead
> of keeping that runtime conditional vs keeping per-cpu slab caches for extra
> sizes between 16K and 128K, then the runtime conditional would be preferable. I
> would guess that quite a bit of memory could get tied up in those caches?
> 
> Why is your preference the opposite? What am I not understanding?

+CC more slab people.

So the above is an inline function, but constructed in a way that it should,
without further inline code, become
- a call to __kmalloc_large_noprof() for build-time constant size larger
than KMALLOC_MAX_CACHE_SIZE
- a call to __kmalloc_cache_noprof() for build-time constant size smaller
than KMALLOC_MAX_CACHE_SIZE, where the cache is picked from an array with
compile-time calculated index
- call to __kmalloc_noprof() for non-constant sizes otherwise

If KMALLOC_MAX_CACHE_SIZE stops being build-time constant, the sensible way
to handle it would be to #ifdef or otherwise compile out away the whole "if
__builtin_constant_p(size)" part and just call __kmalloc_noprof() always, so
we don't blow the inline paths with a KMALLOC_MAX_CACHE_SIZE check leading
to choice between calling __kmalloc_large_noprof() or __kmalloc_cache_noprof().

I just don't believe we would waste so much memory with caches the extra
sizes for sizes between 16K and 128K, so would do that suggestion only if
proven wrong. But I wouldn't mind it that much if you chose it right away.
The solution earlier in this thread to patch __kmalloc_index() would be
worse than either of those two alternatives though.

> 
>> 
>>> I'm happy to make this change if you're certain it's the right approach; please
>>> confirm.
>> 
>> Yes it's much better option than breaking the build-time-constant part of
>> kmalloc_noprof().
>> 
>>>>
>>>> Maybe we could also stop adding + 1 to PAGE_SHIFT_MAX if it's >=64k, so the
>>>> cache size is max 64k and not 128k but that should be probably evaluated
>>>> separately from this series.
>>>
>>> I'm inferring from this that perhaps there is a memory cost with having the
>>> higher orders defined but unused.
>> 
>> Yeah as per above, should not be too large and we could tune it down if
>> necessary.
>> 
>>> Thanks,
>>> Ryan
>>>
>>>>
>>>> Vlastimil
>>>>
>>>>> Found compiling drivers/net/ethernet/qlogic/qed/qed_sriov.c
>>>>>
>>>>> Signed-off-by: Dave Kleikamp <dave.kleikamp@xxxxxxxxxx>
>>>>> ---
>>>>>
>>>>> Ryan,
>>>>>
>>>>> Please consider incorporating this fix or something similar into your
>>>>> mm patch in the boot-time pages size patches.
>>>>>
>>>>>   include/linux/slab.h | 3 ++-
>>>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/include/linux/slab.h b/include/linux/slab.h
>>>>> index 9848296ca6ba..a4c7507ab8ec 100644
>>>>> --- a/include/linux/slab.h
>>>>> +++ b/include/linux/slab.h
>>>>> @@ -685,7 +685,8 @@ static __always_inline unsigned int __kmalloc_index(size_t size,
>>>>>   	if (size <= 1024 * 1024) return 20;
>>>>>   	if (size <=  2 * 1024 * 1024) return 21;
>>>>>   
>>>>> -	if (!IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant)
>>>>> +	if (!IS_ENABLED(CONFIG_ARM64_BOOT_TIME_PAGE_SIZE) &&
>>>>> +	    !IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant)
>>>>>   		BUILD_BUG_ON_MSG(1, "unexpected size in kmalloc_index()");
>>>>>   	else
>>>>>   		BUG();
>>>>
>>>
>> 
>