Hey Michal, > Do you have any example OOM reports? [..] Sure, here is one on a 1TiB, 128-physical core machine running a 5.10-based kernel (sorry, it reads pretty awkwardly when wrapped): ---8<--- mytask invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 <...> oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=sdc,mems_allowed=0-1,global_oom,task_memcg=/sdc,task=mytask,pid=835214,uid=0 Out of memory: Killed process 835214 (mytask) total-vm:787716604kB, anon-rss:787536152kB, file-rss:64kB, shmem-rss:0kB, UID:0 pgtables:1541224kB oom_score_adj:0, hugetlb-usage:0kB Mem-Info: active_anon:320 inactive_anon:198083493 isolated_anon:0 active_file:128283 inactive_file:290086 isolated_file:0 unevictable:3525 dirty:15 writeback:0 slab_reclaimable:35505 slab_unreclaimable:272917 mapped:46414 shmem:822 pagetables:64085088 sec_pagetables:0 bounce:0 kernel_misc_reclaimable:0 free:325793 free_pcp:263277 free_cma:0 Node 0 active_anon:1112kB inactive_anon:268172556kB active_file:270992kB inactive_file:254612kB unevictable:12404kB isolated(anon):0kB isolated(file):0kB mapped:147240kB dirty:52kB writeback:0kB shmem:304kB shmem_thp:0kB shmem_pmdmapped:0kB anon_thp:1310720kB writeback_tmp:0kB kernel_stack:32000kB pagetables:255483108kB sec_pagetables:0kB all_unreclaimable? yes Node 1 active_anon:168kB inactive_anon:524161416kB active_file:242140kB inactive_file:905732kB unevictable:1696kB isolated(anon):0kB isolated(file):0kB mapped:38416kB dirty:8kB writeback:0kB shmem:2984kB shmem_thp:0kB shmem_pmdmapped:0kB anon_thp:267732992kB writeback_tmp:0kB kernel_stack:8520kB pagetables:857244kB sec_pagetables:0kB all_unreclaimable? yes Node 0 Crash free:72kB min:108kB low:220kB high:332kB reserved_highatomic:0KB active_anon:0kB inactive_anon:111940kB active_file:280kB inactive_file:316kB unevictable:0kB writepending:4kB present:114284kB managed:114196kB mlocked:0kB bounce:0kB free_pcp:1528kB local_pcp:24kB free_cma:0kB lowmem_reserve[]: 0 0 0 0 Node 0 DMA32 free:66592kB min:2580kB low:5220kB high:7860kB reserved_highatomic:0KB active_anon:8kB inactive_anon:19456kB active_file:4kB inactive_file:224kB unevictable:0kB writepending:0kB present:2643512kB managed:2643512kB mlocked:0kB bounce:0kB free_pcp:8040kB local_pcp:244kB free_cma:0kB lowmem_reserve[]: 0 0 16029 16029 Node 0 Normal free:513048kB min:513192kB low:1038700kB high:1564208kB reserved_highatomic:0KB active_anon:1104kB inactive_anon:268040520kB active_file:270708kB inactive_file:254072kB unevictable:12404kB writepending:48kB present:533969920kB managed:525510968kB mlocked:12344kB bounce:0kB free_pcp:790040kB local_pcp:7060kB free_cma:0kB lowmem_reserve[]: 0 0 0 0 Node 1 Normal free:723460kB min:755656kB low:1284080kB high:1812504kB reserved_highatomic:0KB active_anon:168kB inactive_anon:524161416kB active_file:242140kB inactive_file:905732kB unevictable:1696kB writepending:8kB present:536866816kB managed:528427664kB mlocked:1588kB bounce:0kB free_pcp:253500kB local_pcp:12kB free_cma:0kB lowmem_reserve[]: 0 0 0 0 Node 0 Crash: 0*4kB 0*8kB 1*16kB (M) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 16kB Node 0 DMA32: 80*4kB (UME) 74*8kB (UE) 23*16kB (UME) 21*32kB (UME) 40*64kB (UE) 35*128kB (UME) 3*256kB (UE) 9*512kB (UME) 13*1024kB (UM) 19*2048kB (UME) 0*4096kB = 66592kB Node 0 Normal: 1999*4kB (UE) 259*8kB (UM) 465*16kB (UM) 114*32kB (UE) 54*64kB (UME) 14*128kB (U) 74*256kB (UME) 128*512kB (UE) 96*1024kB (U) 56*2048kB (U) 46*4096kB (U) = 512292kB Node 1 Normal: 2280*4kB (UM) 12667*8kB (UM) 8859*16kB (UME) 5221*32kB (UME) 1631*64kB (UME) 899*128kB (UM) 330*256kB (UME) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 723208kB Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB 420675 total pagecache pages 0 pages in swap cache Swap cache stats: add 0, delete 0, find 0/0 Free swap = 268435456kB Total swap = 268435456kB ---8<--- Node 0/1 Normal free memory is below respective min watermarks, with 790040kB+253500kB ~= 1GiB of memory on pcp lists. With this patch, the GFP_HIGHUSER_MOVABLE + unrestricted mems_allowed allocation would have allowed us to access all that memory, very likely avoiding the oom. > [..] There were recent changes to scale > the pcp pages and it would be good to know whether they work reasonably > well even under memory pressure. I'm not familiar with these changes, but a quick check of recent activity points to v6.7 commit fa8c4f9a665b ("mm: fix draining remote pageset") ; is this what you are referring to? Thanks, and have a great day, Zach > > I am not objecting to the patch discussed here but it would be really > good to understand the underlying problem and the scale of it. > > Thanks! > -- > Michal Hocko > SUSE Labs