On 11.08.20 15:11, Charan Teja Kalla wrote: > Thanks David for the comments. > > On 8/11/2020 1:59 PM, David Hildenbrand wrote: >> On 10.08.20 18:10, Charan Teja Reddy wrote: >>> The following race is observed with the repeated online, offline and a >>> delay between two successive online of memory blocks of movable zone. >>> >>> P1 P2 >>> >>> Online the first memory block in >>> the movable zone. The pcp struct >>> values are initialized to default >>> values,i.e., pcp->high = 0 & >>> pcp->batch = 1. >>> >>> Allocate the pages from the >>> movable zone. >>> >>> Try to Online the second memory >>> block in the movable zone thus it >>> entered the online_pages() but yet >>> to call zone_pcp_update(). >>> This process is entered into >>> the exit path thus it tries >>> to release the order-0 pages >>> to pcp lists through >>> free_unref_page_commit(). >>> As pcp->high = 0, pcp->count = 1 >>> proceed to call the function >>> free_pcppages_bulk(). >>> Update the pcp values thus the >>> new pcp values are like, say, >>> pcp->high = 378, pcp->batch = 63. >>> Read the pcp's batch value using >>> READ_ONCE() and pass the same to >>> free_pcppages_bulk(), pcp values >>> passed here are, batch = 63, >>> count = 1. >>> >>> Since num of pages in the pcp >>> lists are less than ->batch, >>> then it will stuck in >>> while(list_empty(list)) loop >>> with interrupts disabled thus >>> a core hung. >>> >>> Avoid this by ensuring free_pcppages_bulk() called with proper count of >>> pcp list pages. >>> >>> The mentioned race is some what easily reproducible without [1] because >>> pcp's are not updated for the first memory block online and thus there >>> is a enough race window for P2 between alloc+free and pcp struct values >>> update through onlining of second memory block. >>> >>> With [1], the race is still exists but it is very much narrow as we >>> update the pcp struct values for the first memory block online itself. >>> >>> [1]: https://patchwork.kernel.org/patch/11696389/ >>> >>> Signed-off-by: Charan Teja Reddy <charante@xxxxxxxxxxxxxx> >>> --- >>> mm/page_alloc.c | 16 ++++++++++++++-- >>> 1 file changed, 14 insertions(+), 2 deletions(-) >>> >>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>> index e4896e6..25e7e12 100644 >>> --- a/mm/page_alloc.c >>> +++ b/mm/page_alloc.c >>> @@ -3106,6 +3106,7 @@ static void free_unref_page_commit(struct page *page, unsigned long pfn) >>> struct zone *zone = page_zone(page); >>> struct per_cpu_pages *pcp; >>> int migratetype; >>> + int high; >>> >>> migratetype = get_pcppage_migratetype(page); >>> __count_vm_event(PGFREE); >>> @@ -3128,8 +3129,19 @@ static void free_unref_page_commit(struct page *page, unsigned long pfn) >>> pcp = &this_cpu_ptr(zone->pageset)->pcp; >>> list_add(&page->lru, &pcp->lists[migratetype]); >>> pcp->count++; >>> - if (pcp->count >= pcp->high) { >>> - unsigned long batch = READ_ONCE(pcp->batch); >>> + high = READ_ONCE(pcp->high); >>> + if (pcp->count >= high) { >>> + int batch; >>> + >>> + batch = READ_ONCE(pcp->batch); >>> + /* >>> + * For non-default pcp struct values, high is always >>> + * greater than the batch. If high < batch then pass >>> + * proper count to free the pcp's list pages. >>> + */ >>> + if (unlikely(high < batch)) >>> + batch = min(pcp->count, batch); >>> + >>> free_pcppages_bulk(zone, batch, pcp); >>> } >>> } >>> >> >> I was wondering if we should rather set all pageblocks to >> MIGRATE_ISOLATE in online_pages() before doing the online_pages_range() >> call, and do undo_isolate_page_range() after onlining is done. >> >> move_pfn_range_to_zone()->memmap_init_zone() marks all pageblocks >> MIGRATE_MOVABLE, and as that function is used also during boot, we could >> supply a parameter to configure this. >> >> This would prevent another race from happening: Having pages exposed to >> the buddy ready for allocation in online_pages_range() before the >> sections are marked online. > > Yeah this is another bug. And idea of isolate first, online and undoing > the isolation after zonelist and pcp struct update should work even for > the mentioned issue. This needs to go as a separate fix? Yeah, also requires more work to be done. Will add it to my list of TODOs. > > However, IMO, issue in free_pcppages_bulk() should be fixed by checking > if sane count value is passed. NO? > Posted V2: https://patchwork.kernel.org/patch/11709225/ Yeah, I'm fine with fixing this issue that can actually be reproduced. -- Thanks, David / dhildenb