+Cc Mel, Johannes and Brendan, please keep on future submissions On 2/26/25 03:41, zhongjinji@xxxxxxxxx wrote: > From: zhongjinji <zhongjinji@xxxxxxxxx> > > In the past, nr_reserved_highatomic was considered to be part of > unusable_free when there was no ALLOC_RESERVES flag. To prevent > unusable_free from being too large, it is reasonable to set a > fixed maximum highatomic value. > > Even if the maximum number of highatomic pageblocks is set to be larger, > unusable_free may not increase, since Yu Zhao provided the modification > about nr_free_highatomic in > https://lore.kernel.org/all/20241028182653.3420139-1-yuzhao@xxxxxxxxxx/T/#u I think I raised in discussing that patch the fact that unreserve_highatomic_pageblock() is flawed in that it will only find a highatomic block to unreserve if it has some free pages as it searches via freelist. So if the existing reserve becomes fully used up, it's dead weight, unless the highatomic allocations are shortlived. It could make sense to remove the HIGHATOMIC marking so other, more free blocks might be reserved instead. I fear if this approach isn't just working around that limitation. > More highatomic pageblocks are beneficial for the successful allocation > of high-order page, which is helpful in some devices. Therefore, use Can you elaborate what is it helpful with? And what kind of values you expect to be using? Do you find that your existing HIGHATOMIC blocks are fully depleted as I speculate above? Are your highatomic allocations short or long lived? Maybe simply at this point nr_free_highatomic would be a better metric than nr_reserved_highatomic to decide if we should reserve another pageblock? The counter didn't exist when highatomic reserves were introduced. But maybe also some mechanism (compaction maybe?) could also be looking for highatomic blocks that are fully used and unreserve them? > highatomic_reserve_ratio to adjust the maximum number of highatomic > pageblocks. > > Signed-off-by: zhongjinji <zhongjinji@xxxxxxxxx> Minimally this needs to be documented in e.g. Documentation/admin-guide/sysctl/vm.rst, it's not obvious that the value is divided by 1000 and not 100 for example. > --- > mm/page_alloc.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 579789600a3c..dbdce6a0f694 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -273,6 +273,7 @@ int min_free_kbytes = 1024; > int user_min_free_kbytes = -1; > static int watermark_boost_factor __read_mostly = 15000; > static int watermark_scale_factor = 10; > +static int highatomic_reserve_ratio = 10; > > /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */ > int movable_zone; > @@ -2046,7 +2047,8 @@ static void reserve_highatomic_pageblock(struct page *page, int order, > */ > if ((zone_managed_pages(zone) / 100) < pageblock_nr_pages) > return; > - max_managed = ALIGN((zone_managed_pages(zone) / 100), pageblock_nr_pages); > + max_managed = ALIGN((zone_managed_pages(zone) * highatomic_reserve_ratio / 1000), > + pageblock_nr_pages); > if (zone->nr_reserved_highatomic >= max_managed) > return; > > @@ -6199,6 +6201,13 @@ static const struct ctl_table page_alloc_sysctl_table[] = { > .mode = 0644, > .proc_handler = percpu_pagelist_high_fraction_sysctl_handler, > .extra1 = SYSCTL_ZERO, > + }, > + .procname = "highatomic_reserve_ratio", > + .data = &highatomic_reserve_ratio, > + .maxlen = sizeof(highatomic_reserve_ratio), > + .mode = 0644, > + .proc_handler = proc_dointvec_minmax, > + .extra1 = SYSCTL_ZERO, > }, > { > .procname = "lowmem_reserve_ratio",