Re: [PATCH V2] mm, page_alloc: fix core hung in free_pcppages_bulk()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11.08.20 14:58, Charan Teja Reddy wrote:
> The following race is observed with the repeated online, offline and a
> delay between two successive online of memory blocks of movable zone.
> 
> P1						P2
> 
> Online the first memory block in
> the movable zone. The pcp struct
> values are initialized to default
> values,i.e., pcp->high = 0 &
> pcp->batch = 1.
> 
> 					Allocate the pages from the
> 					movable zone.
> 
> Try to Online the second memory
> block in the movable zone thus it
> entered the online_pages() but yet
> to call zone_pcp_update().
> 					This process is entered into
> 					the exit path thus it tries
> 					to release the order-0 pages
> 					to pcp lists through
> 					free_unref_page_commit().
> 					As pcp->high = 0, pcp->count = 1
> 					proceed to call the function
> 					free_pcppages_bulk().
> Update the pcp values thus the
> new pcp values are like, say,
> pcp->high = 378, pcp->batch = 63.
> 					Read the pcp's batch value using
> 					READ_ONCE() and pass the same to
> 					free_pcppages_bulk(), pcp values
> 					passed here are, batch = 63,
> 					count = 1.
> 
> 					Since num of pages in the pcp
> 					lists are less than ->batch,
> 					then it will stuck in
> 					while(list_empty(list)) loop
> 					with interrupts disabled thus
> 					a core hung.
> 
> Avoid this by ensuring free_pcppages_bulk() is called with proper count
> of pcp list pages.
> 
> The mentioned race is some what easily reproducible without [1] because
> pcp's are not updated for the first memory block online and thus there
> is a enough race window for P2 between alloc+free and pcp struct values
> update through onlining of second memory block.
> 
> With [1], the race is still exists but it is very much narrow as we
> update the pcp struct values for the first memory block online itself.
> 
> [1]: https://patchwork.kernel.org/patch/11696389/
> 

IIUC, this is not limited to the movable zone, it could also happen in
corner cases with the normal zone (e.g., hotplug to a node that only has
DMA memory, or no other memory yet).

> Signed-off-by: Charan Teja Reddy <charante@xxxxxxxxxxxxxx>
> ---
> 
> v1: https://patchwork.kernel.org/patch/11707637/
> 
>  mm/page_alloc.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index e4896e6..839039f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1304,6 +1304,11 @@ static void free_pcppages_bulk(struct zone *zone, int count,
>  	struct page *page, *tmp;
>  	LIST_HEAD(head);
>  
> +	/*
> +	 * Ensure proper count is passed which otherwise would stuck in the
> +	 * below while (list_empty(list)) loop.
> +	 */
> +	count = min(pcp->count, count);
>  	while (count) {
>  		struct list_head *list;
>  
> 

Fixes: and Cc: stable... tags?

-- 
Thanks,

David / dhildenb





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux