(Adding Mel to Cc list) On Thu, Aug 22, 2024 at 9:02 PM Matt Fleming <mfleming@xxxxxxxxxxxxxx> wrote: > > Hey there, > > I'm seeing page allocation failures across the Cloudflare fleet, > typically during the network RX path, when trying to allocate order-0 > pages in interrupt context. The machines appear to be under memory > pressure because the code that gets interrupted is > shrink_folio_list(). Below is an example stacktrace. > > Does anyone have any pointers on how to dig into this some more? It > appears as though the machines are not able to reclaim memory fast > enough when under pressure. Happy to provide more metrics or stats on > request. > > Thanks, > Matt > > ----8<---- > > kswapd1: page allocation failure: order:0, mode:0x820(GFP_ATOMIC), > nodemask=(null),cpuset=/,mems_allowed=0-7 > CPU: 10 PID: 696 Comm: kswapd1 Kdump: loaded Tainted: G O > 6.6.43-CUSTOM #1 > Hardware name: MACHINE > Call Trace: > <IRQ> > dump_stack_lvl+0x3c/0x50 > warn_alloc+0x13a/0x1c0 > __alloc_pages_slowpath.constprop.0+0xc9d/0xd10 > ? srso_alias_return_thunk+0x5/0xfbef5 > ? __alloc_pages_bulk+0x3a0/0x630 > __alloc_pages+0x327/0x340 > __napi_alloc_skb+0x16d/0x1f0 > bnxt_rx_page_skb+0x96/0x1b0 [bnxt_en] > bnxt_rx_pkt+0x201/0x15e0 [bnxt_en] > ? skb_release_data+0x14f/0x1b0 > __bnxt_poll_work+0x156/0x2b0 [bnxt_en] > bnxt_poll+0xd9/0x1c0 [bnxt_en] > ? srso_alias_return_thunk+0x5/0xfbef5 > __napi_poll+0x2b/0x1b0 > bpf_trampoline_6442524138+0x7d/0x1000 > __napi_poll+0x5/0x1b0 > net_rx_action+0x342/0x740 > ? srso_alias_return_thunk+0x5/0xfbef5 > handle_softirqs+0xcf/0x2b0 > irq_exit_rcu+0x6c/0x90 > sysvec_apic_timer_interrupt+0x72/0x90 > </IRQ> > <TASK> > asm_sysvec_apic_timer_interrupt+0x1a/0x20 > RIP: 0010:queued_spin_lock_slowpath+0x260/0x2b0 > Code: 83 e0 03 83 ea 01 48 c1 e0 04 48 63 d2 48 05 c0 30 03 00 48 03 > 04 d5 a0 d7 10 9c 48 89 28 8b 45 08 85 c0 75 09 f3 90 8b 45 08 <85> c0 > 74 f7 48 8b 55 00 48 85 d2 74 83 0f 0d 0a e9 7b ff ff ff 65 > RSP: 0018:ffffc9000f9cb768 EFLAGS: 00000246 > RAX: 0000000000000000 RBX: ffff88905a3a9880 RCX: 0000000000000001 > RDX: 000000000000001b RSI: 0000000000700000 RDI: ffff88905a3a9880 > RBP: ffff88902f5330c0 R08: ffffc9000f9cb750 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000603fce623320 R12: 00000000002c0000 > R13: 0000000000000001 R14: 00000000002c0000 R15: ffff889062f84a00 > zs_malloc+0x9d/0x520 [zsmalloc] > ? srso_alias_return_thunk+0x5/0xfbef5 > ? __zstd_compress+0x60/0xa0 [zstd] > zram_submit_bio+0x8d1/0x9f0 [zram] > ? srso_alias_return_thunk+0x5/0xfbef5 > __submit_bio+0xaa/0x160 > submit_bio_noacct_nocheck+0x145/0x380 > ? submit_bio_noacct+0x24/0x4c0 > submit_bio_wait+0x5b/0xc0 > swap_writepage_bdev_sync+0xf8/0x170 > ? __pfx_submit_bio_wait_endio+0x10/0x10 > swap_writepage+0x36/0x80 > pageout+0xc8/0x240 > shrink_folio_list+0x489/0xd60 > shrink_lruvec+0x5a8/0xc40 > shrink_node+0x2c5/0x7a0 > balance_pgdat+0x32d/0x740 > kswapd+0x205/0x400 > ? __pfx_autoremove_wake_function+0x10/0x10 > ? __pfx_kswapd+0x10/0x10 > kthread+0xe8/0x120 > ? __pfx_kthread+0x10/0x10 > ret_from_fork+0x34/0x50 > ? __pfx_kthread+0x10/0x10 > ret_from_fork_asm+0x1b/0x30 > </TASK> > Mem-Info: > active_anon:14289951 inactive_anon:25056935 isolated_anon:1577 > active_file:3254095 inactive_file:3963476 isolated_file:1 > unevictable:4 dirty:305545 writeback:132 > slab_reclaimable:2916775 slab_unreclaimable:1689088 > mapped:2592762 shmem:1980658 pagetables:530605 > sec_pagetables:0 bounce:0 > kernel_misc_reclaimable:0 > free:618653 free_pcp:129763 free_cma:0 > Node 0 active_anon:6461468kB inactive_anon:11667080kB > active_file:1971908kB inactive_file:2302944kB unevictable:0kB > isolated(anon):960kB isolated(file):0kB mapped:1070000kB > dirty:110140kB writeback:64kB shmem:842272kB shmem_thp:0kB > shmem_pmdmapped:0kB anon_thp:0kB writeback_tmp:0kB > kernel_stack:37624kB pagetables:235212kB sec_pagetables:0kB > all_unreclaimable? no > Node 1 active_anon:7027824kB inactive_anon:12544448kB > active_file:1695500kB inactive_file:2093056kB unevictable:0kB > isolated(anon):308kB isolated(file):0kB mapped:1694880kB > dirty:163436kB writeback:24kB shmem:1090692kB shmem_thp:0kB > shmem_pmdmapped:0kB anon_thp:0kB writeback_tmp:0kB > kernel_stack:31860kB pagetables:231608kB sec_pagetables:0kB > all_unreclaimable? no > Node 2 active_anon:7168612kB inactive_anon:11850084kB > active_file:1669812kB inactive_file:1870596kB unevictable:0kB > isolated(anon):144kB isolated(file):0kB mapped:1420628kB > dirty:105912kB writeback:24kB shmem:1092068kB shmem_thp:0kB > shmem_pmdmapped:0kB anon_thp:0kB writeback_tmp:0kB > kernel_stack:40220kB pagetables:263428kB sec_pagetables:0kB > all_unreclaimable? no > Node 3 active_anon:7160892kB inactive_anon:12851880kB > active_file:1453156kB inactive_file:1884092kB unevictable:0kB > isolated(anon):452kB isolated(file):0kB mapped:1199768kB > dirty:124548kB writeback:72kB shmem:965128kB shmem_thp:0kB > shmem_pmdmapped:0kB anon_thp:2048kB writeback_tmp:0kB > kernel_stack:27124kB pagetables:284676kB sec_pagetables:0kB > all_unreclaimable? no > Node 4 active_anon:7505196kB inactive_anon:12764280kB > active_file:1466756kB inactive_file:1878740kB unevictable:16kB > isolated(anon):640kB isolated(file):0kB mapped:1170484kB > dirty:136668kB writeback:44kB shmem:986212kB shmem_thp:0kB > shmem_pmdmapped:0kB anon_thp:2048kB writeback_tmp:0kB > kernel_stack:32380kB pagetables:312216kB sec_pagetables:0kB > all_unreclaimable? no > Node 5 active_anon:7169752kB inactive_anon:12867040kB > active_file:1769832kB inactive_file:1809448kB unevictable:0kB > isolated(anon):1008kB isolated(file):0kB mapped:1589272kB > dirty:128616kB writeback:112kB shmem:1108816kB shmem_thp:0kB > shmem_pmdmapped:0kB anon_thp:0kB writeback_tmp:0kB > kernel_stack:32784kB pagetables:278392kB sec_pagetables:0kB > all_unreclaimable? no > Node 6 active_anon:7333288kB inactive_anon:12854340kB > active_file:1504536kB inactive_file:2096488kB unevictable:0kB > isolated(anon):1336kB isolated(file):4kB mapped:1117792kB > dirty:228512kB writeback:92kB shmem:958680kB shmem_thp:0kB > shmem_pmdmapped:0kB anon_thp:0kB writeback_tmp:0kB > kernel_stack:43852kB pagetables:254060kB sec_pagetables:0kB > all_unreclaimable? no > Node 7 active_anon:7332772kB inactive_anon:12828588kB > active_file:1484880kB inactive_file:1918540kB unevictable:0kB > isolated(anon):1460kB isolated(file):0kB mapped:1108224kB > dirty:224348kB writeback:96kB shmem:878764kB shmem_thp:0kB > shmem_pmdmapped:0kB anon_thp:2048kB writeback_tmp:0kB > kernel_stack:35580kB pagetables:262828kB sec_pagetables:0kB > all_unreclaimable? no > Node 0 DMA free:11264kB boost:0kB min:48kB low:60kB high:72kB > reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB > active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB > present:15996kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB > local_pcp:0kB free_cma:0kB > lowmem_reserve[]: 0 2095 31529 31529 > Node 0 DMA32 free:118988kB boost:0kB min:6832kB low:8976kB > high:11120kB reserved_highatomic:0KB active_anon:445316kB > inactive_anon:780792kB active_file:122148kB inactive_file:151592kB > unevictable:0kB writepending:1464kB present:2735864kB > managed:2145496kB mlocked:0kB bounce:0kB free_pcp:20468kB > local_pcp:48kB free_cma:0kB > lowmem_reserve[]: 0 0 29434 29434 > Node 0 Normal free:266252kB boost:0kB min:95988kB low:126128kB > high:156268kB reserved_highatomic:305152KB active_anon:6016024kB > inactive_anon:10884436kB active_file:1849108kB inactive_file:2149856kB > unevictable:0kB writepending:108740kB present:30670848kB > managed:30141044kB mlocked:0kB bounce:0kB free_pcp:37432kB > local_pcp:84kB free_cma:0kB > lowmem_reserve[]: 0 0 0 0 > Node 1 Normal free:290496kB boost:0kB min:105164kB low:138184kB > high:171204kB reserved_highatomic:333824KB active_anon:7028084kB > inactive_anon:12543028kB active_file:1694884kB inactive_file:2092728kB > unevictable:0kB writepending:163200kB present:33552384kB > managed:33022704kB mlocked:0kB bounce:0kB free_pcp:53668kB > local_pcp:892kB free_cma:0kB > lowmem_reserve[]: 0 0 0 0 > Node 2 Normal free:295000kB boost:0kB min:105172kB low:138196kB > high:171220kB reserved_highatomic:333824KB active_anon:7168872kB > inactive_anon:11848752kB active_file:1668876kB inactive_file:1871016kB > unevictable:0kB writepending:106604kB present:33554432kB > managed:33024756kB mlocked:0kB bounce:0kB free_pcp:48468kB > local_pcp:752kB free_cma:0kB > lowmem_reserve[]: 0 0 0 0 > Node 3 Normal free:308228kB boost:0kB min:105012kB low:137984kB > high:170956kB reserved_highatomic:333824KB active_anon:7164068kB > inactive_anon:12847600kB active_file:1453016kB inactive_file:1885952kB > unevictable:0kB writepending:126480kB present:33553408kB > managed:32974232kB mlocked:0kB bounce:0kB free_pcp:64400kB > local_pcp:732kB free_cma:0kB > lowmem_reserve[]: 0 0 0 0 > Node 4 Normal free:271672kB boost:0kB min:105172kB low:138196kB > high:171220kB reserved_highatomic:333824KB active_anon:7505196kB > inactive_anon:12763688kB active_file:1465932kB inactive_file:1880212kB > unevictable:16kB writepending:137892kB present:33554432kB > managed:33024756kB mlocked:16kB bounce:0kB free_pcp:60204kB > local_pcp:632kB free_cma:0kB > lowmem_reserve[]: 0 0 0 0 > Node 5 Normal free:291824kB boost:0kB min:105168kB low:138188kB > high:171208kB reserved_highatomic:333824KB active_anon:7169428kB > inactive_anon:12866872kB active_file:1769184kB inactive_file:1811512kB > unevictable:0kB writepending:131024kB present:33553408kB > managed:33023728kB mlocked:0kB bounce:0kB free_pcp:78708kB > local_pcp:568kB free_cma:0kB > lowmem_reserve[]: 0 0 0 0 > Node 6 Normal free:310936kB boost:0kB min:105172kB low:138196kB > high:171220kB reserved_highatomic:333824KB active_anon:7333792kB > inactive_anon:12852816kB active_file:1503264kB inactive_file:2097500kB > unevictable:0kB writepending:229284kB present:33554432kB > managed:33024756kB mlocked:0kB bounce:0kB free_pcp:74936kB > local_pcp:796kB free_cma:0kB > lowmem_reserve[]: 0 0 0 0 > Node 7 Normal free:309668kB boost:0kB min:105112kB low:138116kB > high:171120kB reserved_highatomic:333824KB active_anon:7331892kB > inactive_anon:12827964kB active_file:1484024kB inactive_file:1920356kB > unevictable:0kB writepending:226576kB present:33541120kB > managed:33005940kB mlocked:0kB bounce:0kB free_pcp:80748kB > local_pcp:704kB free_cma:0kB > lowmem_reserve[]: 0 0 0 0 > Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB > 1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 11264kB > Node 0 DMA32: 2225*4kB (UME) 338*8kB (UME) 178*16kB (UME) 459*32kB > (UME) 215*64kB (UME) 115*128kB (ME) 86*256kB (UME) 35*512kB (UME) > 4*1024kB (UM) 6*2048kB (M) 1*4096kB (U) = 118036kB > Node 0 Normal: 797*4kB (H) 871*8kB (H) 802*16kB (H) 804*32kB (H) > 601*64kB (H) 310*128kB (H) 164*256kB (H) 67*512kB (H) 25*1024kB (H) > 14*2048kB (H) 2*4096kB (H) = 265612kB > Node 1 Normal: 507*4kB (H) 680*8kB (H) 682*16kB (H) 699*32kB (H) > 589*64kB (H) 363*128kB (H) 211*256kB (H) 93*512kB (H) 37*1024kB (H) > 13*2048kB (H) 0*4096kB = 291052kB > Node 2 Normal: 598*4kB (H) 843*8kB (H) 740*16kB (H) 735*32kB (H) > 507*64kB (H) 298*128kB (H) 175*256kB (H) 102*512kB (H) 37*1024kB (H) > 21*2048kB (H) 1*4096kB (H) = 297104kB > Node 3 Normal: 440*4kB (H) 509*8kB (H) 493*16kB (H) 559*32kB (H) > 438*64kB (H) 304*128kB (H) 197*256kB (H) 126*512kB (H) 50*1024kB (H) > 21*2048kB (H) 0*4096kB = 307704kB > Node 4 Normal: 604*4kB (H) 716*8kB (H) 674*16kB (H) 819*32kB (H) > 544*64kB (H) 303*128kB (H) 182*256kB (H) 74*512kB (H) 24*1024kB (H) > 20*2048kB (H) 0*4096kB = 268752kB > Node 5 Normal: 809*4kB (H) 873*8kB (H) 775*16kB (H) 749*32kB (H) > 414*64kB (H) 254*128kB (H) 154*256kB (H) 90*512kB (H) 37*1024kB (H) > 31*2048kB (H) 0*4096kB = 292476kB > Node 6 Normal: 659*4kB (H) 689*8kB (H) 708*16kB (H) 851*32kB (H) > 592*64kB (H) 386*128kB (H) 226*256kB (H) 91*512kB (H) 40*1024kB (H) > 13*2048kB (H) 1*4096kB (H) = 310132kB > Node 7 Normal: 898*4kB (H) 907*8kB (H) 893*16kB (H) 897*32kB (H) > 597*64kB (H) 375*128kB (H) 203*256kB (H) 86*512kB (H) 29*1024kB (H) > 20*2048kB (H) 0*4096kB = 306704kB > Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB > Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB > Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB > Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB > Node 4 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB > Node 5 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB > Node 6 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB > Node 7 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB > 9214746 total pagecache pages > 17797 pages in swap cache > Free swap = 208645424kB > Total swap = 263402492kB > 67071581 pages RAM > 0 pages HighMem/MovableOnly > 1220888 pages reserved