On 9/11/23 21:41, Johannes Weiner wrote: > Free page accounting currently happens a bit too high up the call > stack, where it has to deal with guard pages, compaction capturing, > block stealing and even page isolation. This is subtle and fragile, > and makes it difficult to hack on the code. > > Now that type violations on the freelists have been fixed, push the > accounting down to where pages enter and leave the freelist. > > v3: > - fix CONFIG_UNACCEPTED_MEMORY build (lkp) > v2: > - fix CONFIG_DEBUG_PAGEALLOC build (Mel) > > Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx> <snip> > > /* Used for pages not on another list */ > -static inline void add_to_free_list_tail(struct page *page, struct zone *zone, > - unsigned int order, int migratetype) > +static inline void add_to_free_list(struct page *page, struct zone *zone, > + unsigned int order, int migratetype, > + bool tail) > { > struct free_area *area = &zone->free_area[order]; > > - list_add_tail(&page->buddy_list, &area->free_list[migratetype]); > + VM_WARN_ONCE(get_pageblock_migratetype(page) != migratetype, > + "page type is %lu, passed migratetype is %d (nr=%d)\n", > + get_pageblock_migratetype(page), migratetype, 1 << order); Ok, IIUC so you now assume pageblock migratetype is now matching freelist placement at all times. This is a change from the previous treatment as a heuristic that may be sometimes imprecise. Let's assume the previous patches handled the deterministic reasons why those would deviate (modulo my concern about pageblocks spanning multiple zones in reply to 5/6). But unless I'm missing something, I don't think the possible race scenarios were dealt with? Pageblock migratetype is set under zone->lock but there are places that read it outside of zone->lock and then trust it to perform the freelist placement. See for example __free_pages_ok(), or free_unref_page() in the cases it calls free_one_page(). These determine pageblock migratetype before taking the zone->lock. Only for has_isolate_pageblock() cases we are more careful, because previously isolation was the only case where precision was needed. So I think this warning is going to trigger? > + > + if (tail) > + list_add_tail(&page->buddy_list, &area->free_list[migratetype]); > + else > + list_add(&page->buddy_list, &area->free_list[migratetype]); > area->nr_free++; > + > + account_freepages(page, zone, 1 << order, migratetype); > } > > /* <snip> > @@ -757,23 +783,21 @@ static inline void __free_one_page(struct page *page, > VM_BUG_ON_PAGE(page->flags & PAGE_FLAGS_CHECK_AT_PREP, page); > > VM_BUG_ON(migratetype == -1); > - if (likely(!is_migrate_isolate(migratetype))) > - __mod_zone_freepage_state(zone, 1 << order, migratetype); > - > VM_BUG_ON_PAGE(pfn & ((1 << order) - 1), page); > VM_BUG_ON_PAGE(bad_range(zone, page), page); > > while (order < MAX_ORDER) { > - if (compaction_capture(capc, page, order, migratetype)) { > - __mod_zone_freepage_state(zone, -(1 << order), > - migratetype); > + int buddy_mt; > + > + if (compaction_capture(capc, page, order, migratetype)) > return; > - } > > buddy = find_buddy_page_pfn(page, pfn, order, &buddy_pfn); > if (!buddy) > goto done_merging; > > + buddy_mt = get_pfnblock_migratetype(buddy, buddy_pfn); You should assume buddy_mt equals migratetype, no? It's the same assumption as the VM_WARN_ONCE() I've discussed? > + > if (unlikely(order >= pageblock_order)) { Only here buddy_mt can differ and the code in this block already handles that. > /* > * We want to prevent merge between freepages on pageblock > @@ -801,9 +825,9 @@ static inline void __free_one_page(struct page *page, > * merge with it and move up one order. > */ > if (page_is_guard(buddy)) > - clear_page_guard(zone, buddy, order, migratetype); > + clear_page_guard(zone, buddy, order); > else > - del_page_from_free_list(buddy, zone, order); > + del_page_from_free_list(buddy, zone, order, buddy_mt); Ugh so this will add account_freepages() call to each iteration of the __free_one_page() hot loop, which seems like a lot of unnecessary overhead - as long as we are within pageblock_order the migratetype should be the same, and thus also is_migrate_isolate() and is_migrate_cma() tests should return the same value so we shouldn't need to call __mod_zone_page_state() piecemeal like this. > combined_pfn = buddy_pfn & pfn; > page = page + (combined_pfn - pfn); > pfn = combined_pfn;