On Mon, Oct 30, 2023 at 06:09:50PM +0530, Charan Teja Kalla wrote: > __alloc_pages_direct_reclaim() is called from slowpath allocation where > high atomic reserves can be unreserved after there is a progress in > reclaim and yet no suitable page is found. Later should_reclaim_retry() > gets called from slow path allocation to decide if the reclaim needs to > be retried before OOM kill path is taken. > > should_reclaim_retry() checks the available(reclaimable + free pages) > memory against the min wmark levels of a zone and returns: > a) true, if it is above the min wmark so that slow path allocation will > do the reclaim retries. > b) false, thus slowpath allocation takes oom kill path. > > should_reclaim_retry() can also unreserves the high atomic reserves > **but only after all the reclaim retries are exhausted.** > > In a case where there are almost none reclaimable memory and free pages > contains mostly the high atomic reserves but allocation context can't > use these high atomic reserves, makes the available memory below min > wmark levels hence false is returned from should_reclaim_retry() leading > the allocation request to take OOM kill path. This is an early oom kill > because high atomic reserves are holding lot of free memory and > unreserving of them is not attempted. > > (early)OOM is encountered on a machine in the below state(excerpt from > the oom kill logs): > [ 295.998653] Normal free:7728kB boost:0kB min:804kB low:1004kB > high:1204kB reserved_highatomic:8192KB active_anon:4kB inactive_anon:0kB > active_file:24kB inactive_file:24kB unevictable:1220kB writepending:0kB > present:70732kB managed:49224kB mlocked:0kB bounce:0kB free_pcp:688kB > local_pcp:492kB free_cma:0kB > [ 295.998656] lowmem_reserve[]: 0 32 > [ 295.998659] Normal: 508*4kB (UMEH) 241*8kB (UMEH) 143*16kB (UMEH) > 33*32kB (UH) 7*64kB (UH) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB > 0*4096kB = 7752kB > > Per above log, the free memory of ~7MB exist in the high atomic > reserves is not freed up before falling back to oom kill path. > > This fix includes unreserving these atomic reserves in the OOM path > before going for a kill. The side effect of unreserving in oom kill path > is that these free pages are checked against the high wmark. If > unreserved from should_reclaim_retry()/__alloc_pages_direct_reclaim(), > they are checked against the min wmark levels. > > Signed-off-by: Charan Teja Kalla <quic_charante@xxxxxxxxxxx> Thanks for the detailed commit description. Really helpful in understanding the problem you are fixing. > --- > mm/page_alloc.c | 18 ++++++++++++++++++ > 1 file changed, 18 insertions(+) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 95546f3..2a2536d 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -3281,6 +3281,8 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, > .order = order, > }; > struct page *page; > + struct zone *zone; > + struct zoneref *z; > > *did_some_progress = 0; > > @@ -3295,6 +3297,16 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, > } > > /* > + * If should_reclaim_retry() encounters a state where: > + * reclaimable + free doesn't satisfy the wmark levels, > + * it can directly jump to OOM without even unreserving > + * the highatomic page blocks. Try them for once here > + * before jumping to OOM. > + */ > +retry: > + unreserve_highatomic_pageblock(ac, true); > + Not possible to fix this in should_reclaim_retry()? > + /* > * Go through the zonelist yet one more time, keep very high watermark > * here, this is only to catch a parallel oom killing, we must fail if > * we're still under heavy pressure. But make sure that this reclaim > @@ -3307,6 +3319,12 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, > if (page) > goto out; > > + for_each_zone_zonelist_nodemask(zone, z, ac->zonelist, ac->highest_zoneidx, > + ac->nodemask) { > + if (zone->nr_reserved_highatomic > 0) > + goto retry; > + } > + > /* Coredumps can quickly deplete all memory reserves */ > if (current->flags & PF_DUMPCORE) > goto out;