On 10.08.20 18:10, Charan Teja Reddy wrote: > The following race is observed with the repeated online, offline and a > delay between two successive online of memory blocks of movable zone. > > P1 P2 > > Online the first memory block in > the movable zone. The pcp struct > values are initialized to default > values,i.e., pcp->high = 0 & > pcp->batch = 1. > > Allocate the pages from the > movable zone. > > Try to Online the second memory > block in the movable zone thus it > entered the online_pages() but yet > to call zone_pcp_update(). > This process is entered into > the exit path thus it tries > to release the order-0 pages > to pcp lists through > free_unref_page_commit(). > As pcp->high = 0, pcp->count = 1 > proceed to call the function > free_pcppages_bulk(). > Update the pcp values thus the > new pcp values are like, say, > pcp->high = 378, pcp->batch = 63. > Read the pcp's batch value using > READ_ONCE() and pass the same to > free_pcppages_bulk(), pcp values > passed here are, batch = 63, > count = 1. > > Since num of pages in the pcp > lists are less than ->batch, > then it will stuck in > while(list_empty(list)) loop > with interrupts disabled thus > a core hung. > > Avoid this by ensuring free_pcppages_bulk() called with proper count of > pcp list pages. > > The mentioned race is some what easily reproducible without [1] because > pcp's are not updated for the first memory block online and thus there > is a enough race window for P2 between alloc+free and pcp struct values > update through onlining of second memory block. > > With [1], the race is still exists but it is very much narrow as we > update the pcp struct values for the first memory block online itself. > > [1]: https://patchwork.kernel.org/patch/11696389/ > > Signed-off-by: Charan Teja Reddy <charante@xxxxxxxxxxxxxx> > --- > mm/page_alloc.c | 16 ++++++++++++++-- > 1 file changed, 14 insertions(+), 2 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index e4896e6..25e7e12 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -3106,6 +3106,7 @@ static void free_unref_page_commit(struct page *page, unsigned long pfn) > struct zone *zone = page_zone(page); > struct per_cpu_pages *pcp; > int migratetype; > + int high; > > migratetype = get_pcppage_migratetype(page); > __count_vm_event(PGFREE); > @@ -3128,8 +3129,19 @@ static void free_unref_page_commit(struct page *page, unsigned long pfn) > pcp = &this_cpu_ptr(zone->pageset)->pcp; > list_add(&page->lru, &pcp->lists[migratetype]); > pcp->count++; > - if (pcp->count >= pcp->high) { > - unsigned long batch = READ_ONCE(pcp->batch); > + high = READ_ONCE(pcp->high); > + if (pcp->count >= high) { > + int batch; > + > + batch = READ_ONCE(pcp->batch); > + /* > + * For non-default pcp struct values, high is always > + * greater than the batch. If high < batch then pass > + * proper count to free the pcp's list pages. > + */ > + if (unlikely(high < batch)) > + batch = min(pcp->count, batch); > + > free_pcppages_bulk(zone, batch, pcp); > } > } > I was wondering if we should rather set all pageblocks to MIGRATE_ISOLATE in online_pages() before doing the online_pages_range() call, and do undo_isolate_page_range() after onlining is done. move_pfn_range_to_zone()->memmap_init_zone() marks all pageblocks MIGRATE_MOVABLE, and as that function is used also during boot, we could supply a parameter to configure this. This would prevent another race from happening: Having pages exposed to the buddy ready for allocation in online_pages_range() before the sections are marked online. This would avoid any pages from getting allocated before we're completely done onlining. We would need MIGRATE_ISOLATE/CONFIG_MEMORY_ISOLATION also for CONFIG_MEMORY_HOTPLUG. -- Thanks, David / dhildenb