Re: [PATCH] mm/page_alloc: Fix tracepoint mm_page_alloc_zone_locked()

Mel Gorman <mgorman@xxxxxxx> · Wed, 11 May 2022 15:23:03 +0100

On Wed, May 11, 2022 at 05:12:07PM +0900, Wonhyuk Yang wrote:
> Currently, trace point mm_page_alloc_zone_locked() doesn't show
> correct information.
> 
> First, when alloc_flag has ALLOC_HARDER/ALLOC_CMA, page can
> be allocated from MIGRATE_HIGHATOMIC/MIGRATE_CMA. Nevertheless,
> tracepoint use requested migration type not MIGRATE_HIGHATOMIC and
> MIGRATE_CMA.
> 
> Second, after Commit 44042b4498728 ("mm/page_alloc: allow high-order
> pages to be stored on the per-cpu lists") percpu-list can store
> high order pages. But trace point determine whether it is a refiil
> of percpu-list by comparing requested order and 0.
> 
> To handle these problems, use cached migration type by
> get_pcppage_migratetype() instead of requested migration type.
> Then, make mm_page_alloc_zone_locked() be called only two contexts
> (rmqueue_bulk, rmqueue). With a new argument called percpu_refill,
> it can show whether it is a refill of percpu-list correctly.
> 

You're definitely right that the current tracepoint is broken.

I got momentarily confused because HIGHATOMIC and CMA are not stored on
PCP lists even though they are a pageblock migrate type. Superficially
calling get_pcppage_migratetype on a page that cannot be a PCP page
seems silly but in the context of this patch, it happens to work because
it was isolated with __rmqueue_smallest which sets the PCP type even if
the page is not going to a PCP list.

The original intent of that tracepoint was to trace when pages were
removed from the buddy list. That would suggest this untested patch on
top of yours as a simplication;

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0351808322ba..66a70b898130 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2476,6 +2476,8 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
 		del_page_from_free_list(page, zone, current_order);
 		expand(zone, page, order, current_order, migratetype);
 		set_pcppage_migratetype(page, migratetype);
+		trace_mm_page_alloc_zone_locked(page, order, migratetype,
+			pcp_allowed_order(order) && migratetype < MIGRATE_PCPTYPES);
 		return page;
 	}
 
@@ -3025,7 +3027,6 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 			int migratetype, unsigned int alloc_flags)
 {
 	int i, allocated = 0;
-	int mt;
 
 	/*
 	 * local_lock_irq held so equivalent to spin_lock_irqsave for
@@ -3053,9 +3054,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 		 */
 		list_add_tail(&page->lru, list);
 		allocated++;
-		mt = get_pcppage_migratetype(page);
-		trace_mm_page_alloc_zone_locked(page, order, mt, true);
-		if (is_migrate_cma(mt))
+		if (is_migrate_cma(get_pcppage_migratetype(page)))
 			__mod_zone_page_state(zone, NR_FREE_CMA_PAGES,
 					      -(1 << order));
 	}
@@ -3704,7 +3703,6 @@ struct page *rmqueue(struct zone *preferred_zone,
 {
 	unsigned long flags;
 	struct page *page;
-	int mt;
 
 	if (likely(pcp_allowed_order(order))) {
 		/*
@@ -3734,17 +3732,15 @@ struct page *rmqueue(struct zone *preferred_zone,
 		 * reserved for high-order atomic allocation, so order-0
 		 * request should skip it.
 		 */
-		if (order > 0 && alloc_flags & ALLOC_HARDER) {
+		if (order > 0 && alloc_flags & ALLOC_HARDER)
 			page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
-		}
 		if (!page) {
 			page = __rmqueue(zone, order, migratetype, alloc_flags);
 			if (!page)
 				goto failed;
 		}
-		mt = get_pcppage_migratetype(page);
-		trace_mm_page_alloc_zone_locked(page, order, mt, false);
-		__mod_zone_freepage_state(zone, -(1 << order), mt);
+		__mod_zone_freepage_state(zone, -(1 << order),
+					  get_pcppage_migratetype(page));
 		spin_unlock_irqrestore(&zone->lock, flags);
 	} while (check_new_pages(page, order));