Re: [PATCH V2] mm, page_alloc: fix core hung in free_pcppages_bulk()

Charan Teja Kalla <charante@xxxxxxxxxxxxxx> · Wed, 12 Aug 2020 15:16:02 +0530

Thanks David for the inputs.

On 8/12/2020 2:35 AM, David Hildenbrand wrote:
> On 11.08.20 14:58, Charan Teja Reddy wrote:
>> The following race is observed with the repeated online, offline and a
>> delay between two successive online of memory blocks of movable zone.
>>
>> P1						P2
>>
>> Online the first memory block in
>> the movable zone. The pcp struct
>> values are initialized to default
>> values,i.e., pcp->high = 0 &
>> pcp->batch = 1.
>>
>> 					Allocate the pages from the
>> 					movable zone.
>>
>> Try to Online the second memory
>> block in the movable zone thus it
>> entered the online_pages() but yet
>> to call zone_pcp_update().
>> 					This process is entered into
>> 					the exit path thus it tries
>> 					to release the order-0 pages
>> 					to pcp lists through
>> 					free_unref_page_commit().
>> 					As pcp->high = 0, pcp->count = 1
>> 					proceed to call the function
>> 					free_pcppages_bulk().
>> Update the pcp values thus the
>> new pcp values are like, say,
>> pcp->high = 378, pcp->batch = 63.
>> 					Read the pcp's batch value using
>> 					READ_ONCE() and pass the same to
>> 					free_pcppages_bulk(), pcp values
>> 					passed here are, batch = 63,
>> 					count = 1.
>>
>> 					Since num of pages in the pcp
>> 					lists are less than ->batch,
>> 					then it will stuck in
>> 					while(list_empty(list)) loop
>> 					with interrupts disabled thus
>> 					a core hung.
>>
>> Avoid this by ensuring free_pcppages_bulk() is called with proper count
>> of pcp list pages.
>>
>> The mentioned race is some what easily reproducible without [1] because
>> pcp's are not updated for the first memory block online and thus there
>> is a enough race window for P2 between alloc+free and pcp struct values
>> update through onlining of second memory block.
>>
>> With [1], the race is still exists but it is very much narrow as we
>> update the pcp struct values for the first memory block online itself.
>>
>> [1]: https://patchwork.kernel.org/patch/11696389/
>>
> 
> IIUC, this is not limited to the movable zone, it could also happen in
> corner cases with the normal zone (e.g., hotplug to a node that only has
> DMA memory, or no other memory yet).

Yes, this is my understanding too. I explained the above race in terms
of just movable zone for which it is observed. We can add the below line
in the end in patch commit message:
"This is not limited to the movable zone, it could also happen in cases
with the normal zone (e.g., hotplug to a node that only has DMA memory,
or no other memory yet)."

Just curious, there exists such systems where just a dma zone present
and we hot add the normal zone? I am not aware such thing in the
embedded world.
> 
>> Signed-off-by: Charan Teja Reddy <charante@xxxxxxxxxxxxxx>
>> ---
>>
>> v1: https://patchwork.kernel.org/patch/11707637/
>>
>>  mm/page_alloc.c | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index e4896e6..839039f 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -1304,6 +1304,11 @@ static void free_pcppages_bulk(struct zone *zone, int count,
>>  	struct page *page, *tmp;
>>  	LIST_HEAD(head);
>>  
>> +	/*
>> +	 * Ensure proper count is passed which otherwise would stuck in the
>> +	 * below while (list_empty(list)) loop.
>> +	 */
>> +	count = min(pcp->count, count);
>>  	while (count) {
>>  		struct list_head *list;
>>  
>>
> 
> Fixes: and Cc: stable... tags?

Fixes: 5f8dcc21211a ("page-allocator: split per-cpu list into
one-list-per-migrate-type")
Cc: <stable@xxxxxxxxxxxxxxx> [2.6+]

I am not sure If I should have to raise V3 including these?
> 

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum, a Linux Foundation Collaborative Project