On Fri, Dec 09, 2022 at 11:23:50PM +0100, Vlastimil Babka wrote: > On 12/9/22 20:26, Kirill A. Shutemov wrote: > >> > #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT > >> > /* > >> > * Watermark failed for this zone, but see if we can > >> > @@ -4299,6 +4411,9 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags, > >> > > >> > return page; > >> > } else { > >> > + if (try_to_accept_memory(zone)) > >> > + goto try_this_zone; > >> > >> On the other hand, here we failed the full rmqueue(), including the > >> potentially fragmenting fallbacks, so I'm worried that before we finally > >> fail all of that and resort to accepting more memory, we already fragmented > >> the already accepted memory, more than necessary. > > > > I'm not sure I follow. We accept memory in pageblock chunks. Do we want to > > allocate from a free pageblock if we have other memory to tap from? It > > doesn't make sense to me. > > The fragmentation avoidance based on migratetype does work with pageblock > granularity, so yeah, if you accept a single pageblock worth of memory and > then (through __rmqueue_fallback()) end up serving both movable and > unmovable allocations from it, the whole fragmentation avoidance mechanism > is defeated and you end up with unmovable allocations (e.g. page tables) > scattered over many pageblocks and inability to allocate any huge pages. > > >> So one way to prevent would be to move the acceptance into rmqueue() to > >> happen before __rmqueue_fallback(), which I originally had in mind and maybe > >> suggested that previously. > > > > I guess it should be pretty straight forward to fail __rmqueue_fallback() > > if there's non-empty unaccepted_pages list and steer to > > try_to_accept_memory() this way. > > That could be a way indeed. We do have ALLOC_NOFRAGMENT which could be > possible to employ here. > But maybe the zone_watermark_fast() modification would be simpler yet > sufficient. It makes sense to me that we'd try to keep a high watermark > worth of pre-accepted memory. zone_watermark_fast() would fail at low > watermark, so we could try accepting (high-low) at a time instead of single > pageblock. Looks like we already have __zone_watermark_unusable_free() that seems match use-case rather closely. We only need switch unaccepted memory to per-zone accounting. The fixup below suppose to do the trick, but I'm not sure how to test fragmentation avoidance properly. Any suggestions? diff --git a/drivers/base/node.c b/drivers/base/node.c index ca6f0590be21..1bd2d245edee 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -483,7 +483,7 @@ static ssize_t node_read_meminfo(struct device *dev, #endif #ifdef CONFIG_UNACCEPTED_MEMORY , - nid, K(node_page_state(pgdat, NR_UNACCEPTED)) + nid, K(sum_zone_node_page_state(nid, NR_UNACCEPTED)) #endif ); len += hugetlb_report_node_meminfo(buf, len, nid); diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c index 789b77c7b6df..e9c05b4c457c 100644 --- a/fs/proc/meminfo.c +++ b/fs/proc/meminfo.c @@ -157,7 +157,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v) #ifdef CONFIG_UNACCEPTED_MEMORY show_val_kb(m, "Unaccepted: ", - global_node_page_state(NR_UNACCEPTED)); + global_zone_page_state(NR_UNACCEPTED)); #endif hugetlb_report_meminfo(m); diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 9c762e8175fc..8b5800cd4424 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -152,6 +152,9 @@ enum zone_stat_item { NR_ZSPAGES, /* allocated in zsmalloc */ #endif NR_FREE_CMA_PAGES, +#ifdef CONFIG_UNACCEPTED_MEMORY + NR_UNACCEPTED, +#endif NR_VM_ZONE_STAT_ITEMS }; enum node_stat_item { @@ -198,9 +201,6 @@ enum node_stat_item { NR_FOLL_PIN_ACQUIRED, /* via: pin_user_page(), gup flag: FOLL_PIN */ NR_FOLL_PIN_RELEASED, /* pages returned via unpin_user_page() */ NR_KERNEL_STACK_KB, /* measured in KiB */ -#ifdef CONFIG_UNACCEPTED_MEMORY - NR_UNACCEPTED, -#endif #if IS_ENABLED(CONFIG_SHADOW_CALL_STACK) NR_KERNEL_SCS_KB, /* measured in KiB */ #endif diff --git a/mm/page_alloc.c b/mm/page_alloc.c index e80e8d398863..404b267332a9 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1779,7 +1779,7 @@ static bool try_to_accept_memory(struct zone *zone) migratetype = get_pfnblock_migratetype(page, page_to_pfn(page)); __mod_zone_freepage_state(zone, -1 << order, migratetype); - __mod_node_page_state(page_pgdat(page), NR_UNACCEPTED, -1 << order); + __mod_zone_page_state(zone, NR_UNACCEPTED, -1 << order); spin_unlock_irqrestore(&zone->lock, flags); if (last) @@ -1808,7 +1808,7 @@ static void __free_unaccepted(struct page *page, unsigned int order) migratetype = get_pfnblock_migratetype(page, page_to_pfn(page)); list_add_tail(&page->lru, &zone->unaccepted_pages); __mod_zone_freepage_state(zone, 1 << order, migratetype); - __mod_node_page_state(page_pgdat(page), NR_UNACCEPTED, 1 << order); + __mod_zone_page_state(zone, NR_UNACCEPTED, 1 << order); spin_unlock_irqrestore(&zone->lock, flags); if (first) @@ -4074,6 +4074,9 @@ static inline long __zone_watermark_unusable_free(struct zone *z, if (!(alloc_flags & ALLOC_CMA)) unusable_free += zone_page_state(z, NR_FREE_CMA_PAGES); #endif +#ifdef CONFIG_UNACCEPTED_MEMORY + unusable_free += zone_page_state(z, NR_UNACCEPTED); +#endif return unusable_free; } -- Kiryl Shutsemau / Kirill A. Shutemov