Re: [PATCH] mm, page_alloc: reduce page alloc/free sanity checks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/5/23 14:45, Mel Gorman wrote:
> On Thu, Feb 16, 2023 at 10:51:31AM +0100, Vlastimil Babka wrote:
>> Historically, we have performed sanity checks on all struct pages being
>> allocated or freed, making sure they have no unexpected page flags or
>> certain field values. This can detect insufficient cleanup and some
>> cases of use-after-free, although on its own it can't always identify
>> the culprit. The result is a warning and the "bad page" being leaked.
>> 
>> The checks do need some cpu cycles, so in 4.7 with commits 479f854a207c
>> ("mm, page_alloc: defer debugging checks of pages allocated from the
>> PCP") and 4db7548ccbd9 ("mm, page_alloc: defer debugging checks of freed
>> pages until a PCP drain") they were no longer performed in the hot paths
>> when allocating and freeing from pcplists, but only when pcplists are
>> bypassed, refilled or drained. For debugging purposes, with
>> CONFIG_DEBUG_VM enabled the checks were instead still done in the
>> hot paths and not when refilling or draining pcplists.
>> 
>> With 4462b32c9285 ("mm, page_alloc: more extensive free page checking
>> with debug_pagealloc"), enabling debug_pagealloc also moved the sanity
>> checks back to hot pahs. When both debug_pagealloc and CONFIG_DEBUG_VM
>> are enabled, the checks are done both in hotpaths and pcplist
>> refill/drain.
>> 
>> Even though the non-debug default today might seem to be a sensible
>> tradeoff between overhead and ability to detect bad pages, on closer
>> look it's arguably not. As most allocations go through the pcplists,
>> catching any bad pages when refilling or draining pcplists has only a
>> small chance, insufficient for debugging or serious hardening purposes.
>> On the other hand the cost of the checks is concentrated in the already
>> expensive drain/refill batching operations, and those are done under the
>> often contended zone lock. That was recently identified as an issue for
>> page allocation and the zone lock contention reduced by moving the
>> checks outside of the locked section with a patch "mm: reduce lock
>> contention of pcp buffer refill", but the cost of the checks is still
>> visible compared to their removal [1]. In the pcplist draining path
>> free_pcppages_bulk() the checks are still done under zone->lock.
>> 
>> Thus, remove the checks from pcplist refill and drain paths completely.
>> Introduce a static key check_pages_enabled to control checks during page
>> allocation a freeing (whether pcplist is used or bypassed). The static
>> key is enabled if either is true:
>> - kernel is built with CONFIG_DEBUG_VM=y (debugging)
>> - debug_pagealloc or page poisoning is boot-time enabled (debugging)
>> - init_on_alloc or init_on_free is boot-time enabled (hardening)
>> 
>> The resulting user visible changes:
>> - no checks when draining/refilling pcplists - less overhead, with
>>   likely no practical reduction of ability to catch bad pages
>> - no checks when bypassing pcplists in default config (no
>>   debugging/hardening) - less overhead etc. as above
>> - on typical hardened kernels [2], checks are now performed on each page
>>   allocation/free (previously only when bypassing/draining/refilling
>>   pcplists) - the init_on_alloc/init_on_free enabled should be sufficient
>>   indication for preferring more costly alloc/free operations for
>>   hardening purposes and we shouldn't need to introduce another toggle
>> - code (various wrappers) removal and simplification
>> 
>> [1] https://lore.kernel.org/all/68ba44d8-6899-c018-dcb3-36f3a96e6bea@xxxxxxxxxxxxxxxxxxx/
>> [2] https://lore.kernel.org/all/63ebc499.a70a0220.9ac51.29ea@xxxxxxxxxxxxx/
>> 
>> Reported-by: Alexander Halbuer <halbuer@xxxxxxxxxxxxxxxxxxx>
>> Reported-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
>> Signed-off-by: Vlastimil Babka <vbabka@xxxxxxx>
> 
> Acked-by: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>

Thanks.

> Some minor comments below
> 
>> @@ -1432,9 +1448,11 @@ static __always_inline bool free_pages_prepare(struct page *page,
>>  		for (i = 1; i < (1 << order); i++) {
>>  			if (compound)
>>  				bad += free_tail_pages_check(page, page + i);
> 
> free_tail_pages_check is also a function that only does something useful
> when CONFIG_DEBUG_VM is set. While it might be outside the scope of the
> patch, it might also benefit from check_pages_enabled checks?

True, will send a followup. Will also rename it to free_tail_page_prepare()
as it in fact also combines a preparation component with optional checks
component.
Will remove the unlikely()s you pointed out as well.

>> -			if (unlikely(free_page_is_bad(page + i))) {
>> -				bad++;
>> -				continue;
>> +			if (static_branch_unlikely(&check_pages_enabled)) {
>> +				if (unlikely(free_page_is_bad(page + i))) {
>> +					bad++;
>> +					continue;
>> +				}
>>  			}
>>  			(page + i)->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
>>  		}
> 
> The unlikely() within a static_branch_unlikely probably adds very little
> given the block is so tiny. 
> 
>> @@ -2392,56 +2369,20 @@ static inline int check_new_page(struct page *page)
>>  	return 1;
>>  }
>>  
>> -static bool check_new_pages(struct page *page, unsigned int order)
>> +static inline bool check_new_pages(struct page *page, unsigned int order)
>>  {
>> -	int i;
>> -	for (i = 0; i < (1 << order); i++) {
>> -		struct page *p = page + i;
>> +	if (static_branch_unlikely(&check_pages_enabled)) {
>> +		for (int i = 0; i < (1 << order); i++) {
>> +			struct page *p = page + i;
>>  
>> -		if (unlikely(check_new_page(p)))
>> -			return true;
>> +			if (unlikely(check_new_page(p)))
>> +				return true;
>> +		}
>>  	}
>>  
> 
> unlikely() within static_branch_unlikely probably adds very little.
> 
> Otherwise, looks good. I agree that with changes over time that the ability
> of the checks to detect anything is reduced and it's probably at the point
> where it can only detect a very specific bit corruption instead of broken
> code. Commit 44042b449872 ("mm/page_alloc: allow high-order pages to be
> stored on the per-cpu lists") also likely reduced the ability of the checks
> to find anything.
> 





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux