On 10/27/24 21:17, Yu Zhao wrote: > On Sun, Oct 27, 2024 at 1:53 PM Vlastimil Babka <vbabka@xxxxxxx> wrote: >> >> On 10/26/24 05:36, Yu Zhao wrote: >> > OOM kills due to vastly overestimated free highatomic reserves were >> > observed: >> > >> > ... invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0 ... >> > Node 0 Normal free:1482936kB boost:0kB min:410416kB low:739404kB high:1068392kB reserved_highatomic:1073152KB ... >> > Node 0 Normal: 1292*4kB (ME) 1920*8kB (E) 383*16kB (UE) 220*32kB (ME) 340*64kB (E) 2155*128kB (UE) 3243*256kB (UE) 615*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 1477408kB >> > >> > The second line above shows that the OOM kill was due to the following >> > condition: >> > >> > free (1482936kB) - reserved_highatomic (1073152kB) = 409784KB < min (410416kB) >> > >> > And the third line shows there were no free pages in any >> > MIGRATE_HIGHATOMIC pageblocks, which otherwise would show up as type >> > 'H'. Therefore __zone_watermark_unusable_free() underestimated the >> > usable free memory by over 1GB, which resulted in the unnecessary OOM >> > kill above. >> > >> > The comments in __zone_watermark_unusable_free() warns about the >> > potential risk, i.e., >> > >> > If the caller does not have rights to reserves below the min >> > watermark then subtract the high-atomic reserves. This will >> > over-estimate the size of the atomic reserve but it avoids a search. >> > >> > However, it is possible to keep track of free pages in reserved >> > highatomic pageblocks with a new per-zone counter nr_free_highatomic >> > protected by the zone lock, to avoid a search when calculating the >> >> It's only possible to track this reliably since the "mm: page_alloc: >> freelist migratetype hygiene" patchset was merged, which explains why >> nr_reserved_highatomic was used until now, even if it's imprecise. > > I just refreshed my memory by quickly going through the discussion > around that series and didn't find anything that helps me understand > the above. More pointers please? For example: - a page is on pcplist in MIGRATE_MOVABLE list - we reserve its pageblock as highatomic, which does nothing to the page on the pcplist - page above is flushed from pcplist to zone freelist, but it remembers it was MIGRATE_MOVABLE, merges with another buddy/buddies from the now-highatomic list, the resulting order-X page ends up on the movable freelist despite being in highatomic pageblock. The counter of free highatomic is now wrong wrt the freelist reality The series has addressed various scenarios like that, where page can end up on the wrong freelist.