Re: [PATCH] mm/page_alloc: Fix tracepoint mm_page_alloc_zone_locked()

Wonhyuk Yang <vvghjk1234@xxxxxxxxx> · Thu, 12 May 2022 00:02:30 +0900

On Wed, May 11, 2022 at 11:23 PM Mel Gorman <mgorman@xxxxxxx> wrote:
>
> On Wed, May 11, 2022 at 05:12:07PM +0900, Wonhyuk Yang wrote:
> > Currently, trace point mm_page_alloc_zone_locked() doesn't show
> > correct information.
> >
> > First, when alloc_flag has ALLOC_HARDER/ALLOC_CMA, page can
> > be allocated from MIGRATE_HIGHATOMIC/MIGRATE_CMA. Nevertheless,
> > tracepoint use requested migration type not MIGRATE_HIGHATOMIC and
> > MIGRATE_CMA.
> >
> > Second, after Commit 44042b4498728 ("mm/page_alloc: allow high-order
> > pages to be stored on the per-cpu lists") percpu-list can store
> > high order pages. But trace point determine whether it is a refiil
> > of percpu-list by comparing requested order and 0.
> >
> > To handle these problems, use cached migration type by
> > get_pcppage_migratetype() instead of requested migration type.
> > Then, make mm_page_alloc_zone_locked() be called only two contexts
> > (rmqueue_bulk, rmqueue). With a new argument called percpu_refill,
> > it can show whether it is a refill of percpu-list correctly.
> >
>
> You're definitely right that the current tracepoint is broken.
>
> I got momentarily confused because HIGHATOMIC and CMA are not stored on
> PCP lists even though they are a pageblock migrate type. Superficially
> calling get_pcppage_migratetype on a page that cannot be a PCP page
> seems silly but in the context of this patch, it happens to work because
> it was isolated with __rmqueue_smallest which sets the PCP type even if
> the page is not going to a PCP list.

Yes, I agree that calling get_pcppage_migratetype look quite confusing.

> The original intent of that tracepoint was to trace when pages were
> removed from the buddy list. That would suggest this untested patch on
> top of yours as a simplication;
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 0351808322ba..66a70b898130 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2476,6 +2476,8 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
>                 del_page_from_free_list(page, zone, current_order);
>                 expand(zone, page, order, current_order, migratetype);
>                 set_pcppage_migratetype(page, migratetype);
> +               trace_mm_page_alloc_zone_locked(page, order, migratetype,
> +                       pcp_allowed_order(order) && migratetype < MIGRATE_PCPTYPES);
>                 return page;
>         }

Interestingly, my first approach was quite similar your suggestion. But I
noticed that there can be a request whose migration type is MOVABLE
and alloc_flags doen't have ALLOC_CMA. In that case, page are marked
as percpu-refill even though it was allocated from buddy-list directly.
Is there no problem if we just ignore this case?

> @@ -3025,7 +3027,6 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
>                         int migratetype, unsigned int alloc_flags)
>  {
>         int i, allocated = 0;
> -       int mt;
>
>         /*
>          * local_lock_irq held so equivalent to spin_lock_irqsave for
> @@ -3053,9 +3054,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
>                  */
>                 list_add_tail(&page->lru, list);
>                 allocated++;
> -               mt = get_pcppage_migratetype(page);
> -               trace_mm_page_alloc_zone_locked(page, order, mt, true);
> -               if (is_migrate_cma(mt))
> +               if (is_migrate_cma(get_pcppage_migratetype(page)))
>                         __mod_zone_page_state(zone, NR_FREE_CMA_PAGES,
>                                               -(1 << order));
>         }
> @@ -3704,7 +3703,6 @@ struct page *rmqueue(struct zone *preferred_zone,
>  {
>         unsigned long flags;
>         struct page *page;
> -       int mt;
>
>         if (likely(pcp_allowed_order(order))) {
>                 /*
> @@ -3734,17 +3732,15 @@ struct page *rmqueue(struct zone *preferred_zone,
>                  * reserved for high-order atomic allocation, so order-0
>                  * request should skip it.
>                  */
> -               if (order > 0 && alloc_flags & ALLOC_HARDER) {
> +               if (order > 0 && alloc_flags & ALLOC_HARDER)
>                         page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
> -               }
>                 if (!page) {
>                         page = __rmqueue(zone, order, migratetype, alloc_flags);
>                         if (!page)
>                                 goto failed;
>                 }
> -               mt = get_pcppage_migratetype(page);
> -               trace_mm_page_alloc_zone_locked(page, order, mt, false);
> -               __mod_zone_freepage_state(zone, -(1 << order), mt);
> +               __mod_zone_freepage_state(zone, -(1 << order),
> +                                         get_pcppage_migratetype(page));
>                 spin_unlock_irqrestore(&zone->lock, flags);
>         } while (check_new_pages(page, order));
>