Re: [PATCH V2] mm, page_alloc: fix core hung in free_pcppages_bulk()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu 13-08-20 22:57:32, Charan Teja Kalla wrote:
> Thanks Michal.
> 
> On 8/13/2020 10:00 PM, Michal Hocko wrote:
> > On Thu 13-08-20 21:51:29, Charan Teja Kalla wrote:
> >> Thanks Michal for comments.
> >>
> >> On 8/13/2020 5:11 PM, Michal Hocko wrote:
> >>> On Tue 11-08-20 18:28:23, Charan Teja Reddy wrote:
> >>> [...]
> >>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >>>> index e4896e6..839039f 100644
> >>>> --- a/mm/page_alloc.c
> >>>> +++ b/mm/page_alloc.c
> >>>> @@ -1304,6 +1304,11 @@ static void free_pcppages_bulk(struct zone *zone, int count,
> >>>>  	struct page *page, *tmp;
> >>>>  	LIST_HEAD(head);
> >>>>  
> >>>> +	/*
> >>>> +	 * Ensure proper count is passed which otherwise would stuck in the
> >>>> +	 * below while (list_empty(list)) loop.
> >>>> +	 */
> >>>> +	count = min(pcp->count, count);
> >>>>  	while (count) {
> >>>>  		struct list_head *list;
> >>>
> >>>
> >>> How does this prevent the race actually?
> >>
> >> This doesn't prevent the race. This only fixes the core hung(as this is
> >> called with spin_lock_irq()) caused by the race condition. This core
> >> hung is because of incorrect count value is passed to the
> >> free_pcppages_bulk() function.
> > 
> > Let me ask differently. What does enforce that the count and lists do
> > not get out of sync in the loop. 
> 
> count value is updated whenever an order-0 page is being added to the
> pcp lists through free_unref_page_commit(), which is being called with
> both interrupts, premption disabled.
> static void free_unref_page_commit(struct page *page, {
> 	....
> 	list_add(&page->lru, &pcp->lists[migratetype]);
> 	pcp->count++
> }
> 
> As these are pcp lists, they only gets touched by another process when
> this process is context switched, which happens only after enabling
> premption or interrupts. So, as long as process is operating on these
> pcp lists in free_unref_page_commit function, the count and lists are
> always synced.
> 
> However, the problem here is not that the count and lists are being out
> of sync. They do always in sync, as explained above. It is with the
> asking free_pcppages_bulk() to free the pages more than what is present
> in the pcp lists which is ending up in while(list_empty()).

You are right. I managed to confuse myself. The thing is that the batch
count is out of sync.

> > Your changelog says that the fix is to
> > use the proper value without any specifics.
> > 
> Will change this to: Ensure the count value passed is not greater than
> the pcp lists count. Any better you suggest?

Yes, this makes it more clear.

Feel free to add
Acked-by: Michal Hocko <mhocko@xxxxxxxx>
-- 
Michal Hocko
SUSE Labs




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux