On Fri, 24 Nov 2023, Charan Teja Kalla wrote: > __alloc_pages_direct_reclaim() is called from slowpath allocation where > high atomic reserves can be unreserved after there is a progress in > reclaim and yet no suitable page is found. Later should_reclaim_retry() > gets called from slow path allocation to decide if the reclaim needs to > be retried before OOM kill path is taken. > > should_reclaim_retry() checks the available(reclaimable + free pages) > memory against the min wmark levels of a zone and returns: > a) true, if it is above the min wmark so that slow path allocation will > do the reclaim retries. > b) false, thus slowpath allocation takes oom kill path. > > should_reclaim_retry() can also unreserves the high atomic reserves > **but only after all the reclaim retries are exhausted.** > > In a case where there are almost none reclaimable memory and free pages > contains mostly the high atomic reserves but allocation context can't > use these high atomic reserves, makes the available memory below min > wmark levels hence false is returned from should_reclaim_retry() leading > the allocation request to take OOM kill path. This can turn into a early > oom kill if high atomic reserves are holding lot of free memory and > unreserving of them is not attempted. > > (early)OOM is encountered on a VM with the below state: > [ 295.998653] Normal free:7728kB boost:0kB min:804kB low:1004kB > high:1204kB reserved_highatomic:8192KB active_anon:4kB inactive_anon:0kB > active_file:24kB inactive_file:24kB unevictable:1220kB writepending:0kB > present:70732kB managed:49224kB mlocked:0kB bounce:0kB free_pcp:688kB > local_pcp:492kB free_cma:0kB > [ 295.998656] lowmem_reserve[]: 0 32 > [ 295.998659] Normal: 508*4kB (UMEH) 241*8kB (UMEH) 143*16kB (UMEH) > 33*32kB (UH) 7*64kB (UH) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB > 0*4096kB = 7752kB > > Per above log, the free memory of ~7MB exist in the high atomic > reserves is not freed up before falling back to oom kill path. > > Fix it by trying to unreserve the high atomic reserves in > should_reclaim_retry() before __alloc_pages_direct_reclaim() can > fallback to oom kill path. > > Fixes: 0aaa29a56e4f ("mm, page_alloc: reserve pageblocks for high-order atomic allocations on demand") > Reported-by: Chris Goldsworthy <quic_cgoldswo@xxxxxxxxxxx> > Suggested-by: Michal Hocko <mhocko@xxxxxxxx> > Acked-by: Michal Hocko <mhocko@xxxxxxxx> > Signed-off-by: Charan Teja Kalla <quic_charante@xxxxxxxxxxx> Acked-by: David Rientjes <rientjes@xxxxxxxxxx>