Re: [RFC PATCH] mm/slab: Avoid build bug for calls to kmalloc with a large constant

Vlastimil Babka <vbabka@xxxxxxx> · Tue, 26 Nov 2024 16:27:23 +0100

On 11/26/24 16:09, Vlastimil Babka wrote:
> On 11/26/24 15:53, Ryan Roberts wrote:
>> On 26/11/2024 12:36, Vlastimil Babka wrote:
>>> On 11/26/24 13:18, Ryan Roberts wrote:
>>>> On 14/11/2024 10:09, Vlastimil Babka wrote:
>>>>> On 11/1/24 21:16, Dave Kleikamp wrote:
>>>>>> When boot-time page size is enabled, the test against KMALLOC_MAX_CACHE_SIZE
>>>>>> is no longer optimized out with a constant size, so a build bug may
>>>>>> occur on a path that won't be reached.
>>>>>
>>>>> That's rather unfortunate, the __builtin_constant_p(size) part of
>>>>> kmalloc_noprof() really expects things to resolve at compile time and it
>>>>> would be better to keep it that way.
>>>>>
>>>>> I think it would be better if we based KMALLOC_MAX_CACHE_SIZE itself on
>>>>> PAGE_SHIFT_MAX and kept it constant, instead of introducing
>>>>> KMALLOC_SHIFT_HIGH_MAX only for some sanity checks.
>>>>>
>>>>> So if the kernel was built to support 4k to 64k, but booted as 4k, it would
>>>>> still create and use kmalloc caches up to 128k. SLUB should handle that fine
>>>>> (if not, please report it :)
>>>>
>>>> So when PAGE_SIZE_MAX=64K and PAGE_SIZE=4K, kmalloc will support up to 128K
>>>> whereas before it only supported up to 8K. I was trying to avoid that since I
>>>> assumed that would be costly in terms of extra memory allocated for those higher
>>>> order buckets that will never be used. But I have no idea how SLUB works in
>>>> practice. Perhaps memory for the cache is only lazily allocated so we won't see
>>>> an issue in practice?
>>> 
>>> Yes the e.g. 128k slabs themselves will be lazily allocated. There will be
>>> some overhead with the management structures (struct kmem_cache etc) but
>>> much smaller.
>>> To be completely honest, some extra overhead might come to be when the slabs
>>> are allocated ans later the user frees those allocations. kmalloc_large()
>>> wwould return them immediately, while a regular kmem_cache will keep one or
>>> more per cpu for reuse. But if that becomes a visible problem we can tune
>>> those caches to discard slabs more aggressively.
>> 
>> Sorry to keep pushing on this, now that I've actually looked at the code, I feel
>> I have a slightly better understanding:
>> 
>> void *kmalloc_noprof(size_t size, gfp_t flags)
>> {
>> 	if (__builtin_constant_p(size) && size) {
>> 		
>> 		if (size > KMALLOC_MAX_CACHE_SIZE)
>> 			return __kmalloc_large_noprof(size, flags); <<< (1)
>> 
>> 		index = kmalloc_index(size);
>> 		return __kmalloc_cache_noprof(...);   <<< (2)
>> 	}
>> 	return __kmalloc_noprof(size, flags);   <<< (3)
>> }
>> 
>> So if size and KMALLOC_MAX_CACHE_SIZE are constant, we end up with this
>> resolving either to a call to (1) or (2), decided at compile time. If
>> KMALLOC_MAX_CACHE_SIZE is not constant, (1), (2) and the runtime conditional
>> need to be kept in the function.
>> 
>> But intuatively, I would have guessed that given the choice between the overhead
>> of keeping that runtime conditional vs keeping per-cpu slab caches for extra
>> sizes between 16K and 128K, then the runtime conditional would be preferable. I
>> would guess that quite a bit of memory could get tied up in those caches?
>> 
>> Why is your preference the opposite? What am I not understanding?
> 
> +CC more slab people.
> 
> So the above is an inline function, but constructed in a way that it should,
> without further inline code, become
> - a call to __kmalloc_large_noprof() for build-time constant size larger
> than KMALLOC_MAX_CACHE_SIZE
> - a call to __kmalloc_cache_noprof() for build-time constant size smaller
> than KMALLOC_MAX_CACHE_SIZE, where the cache is picked from an array with
> compile-time calculated index
> - call to __kmalloc_noprof() for non-constant sizes otherwise
> 
> If KMALLOC_MAX_CACHE_SIZE stops being build-time constant, the sensible way
> to handle it would be to #ifdef or otherwise compile out away the whole "if
> __builtin_constant_p(size)" part and just call __kmalloc_noprof() always, so
> we don't blow the inline paths with a KMALLOC_MAX_CACHE_SIZE check leading
> to choice between calling __kmalloc_large_noprof() or __kmalloc_cache_noprof().

Or maybe we could have PAGE_SIZE_MAX derived KMALLOC_MAX_CACHE_SIZE_MAX
behave as the code above currently does with KMALLOC_MAX_CACHE_SIZE, and
additionally have PAGE_SIZE_MIN derived KMALLOC_MAX_CACHE_SIZE_MIN, where
build-time-constant size larger than KMALLOC_MAX_CACHE_SIZE_MIN (which is a
compile-time test) is redirected to __kmalloc_noprof() for a run-time test.

That seems like the optimum solution :)

> I just don't believe we would waste so much memory with caches the extra
> sizes for sizes between 16K and 128K, so would do that suggestion only if
> proven wrong. But I wouldn't mind it that much if you chose it right away.
> The solution earlier in this thread to patch __kmalloc_index() would be
> worse than either of those two alternatives though.