On Wed, Aug 17, 2022 at 07:50:13PM +0300, Gražvydas Ignotas wrote: > On Tue, Aug 16, 2022 at 9:52 PM Gražvydas Ignotas <notasas@xxxxxxxxx> wrote: > > Basically, when there is git activity in the container with a memory > > limit, other processes in the same container start to suffer (very) > > occasional network issues (mostly DNS lookup failures). > > ok I've traced this and it's failing in try_charge_memcg(), which > doesn't seem to be trying too hard because it's called from irq > context. > > Here is the backtrace: > <IRQ> > ? fib_validate_source+0xb4/0x100 > ? ip_route_input_slow+0xa11/0xb70 > mem_cgroup_charge_skmem+0x4b/0xf0 > __sk_mem_raise_allocated+0x17f/0x3e0 > __udp_enqueue_schedule_skb+0x220/0x270 > udp_queue_rcv_one_skb+0x330/0x5e0 > udp_unicast_rcv_skb+0x75/0x90 > __udp4_lib_rcv+0x1ba/0xca0 > ? ip_rcv_finish_core.constprop.0+0x63/0x490 > ip_protocol_deliver_rcu+0xd6/0x230 > ip_local_deliver_finish+0x73/0xa0 > __netif_receive_skb_one_core+0x8b/0xa0 > process_backlog+0x8e/0x120 > __napi_poll+0x2c/0x160 > net_rx_action+0x2a2/0x360 > ? rebalance_domains+0xeb/0x3b0 > __do_softirq+0xeb/0x2eb > __irq_exit_rcu+0xb9/0x110 > sysvec_apic_timer_interrupt+0xa2/0xd0 > </IRQ> > > Calling mem_cgroup_print_oom_meminfo() in such a case reveals: > > memory: usage 7812476kB, limit 7812500kB, failcnt 775198 > swap: usage 0kB, limit 0kB, failcnt 0 > Memory cgroup stats for > /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podb8f4f0e9_fb95_4f2d_8443_e6a78f235c9a.slice/docker-9e7cad93b2e0774d49148474989b41fe6d67a5985d059d08d9d64495f1539a81.scope: > anon 348016640 > file 7502163968 > kernel 146997248 > kernel_stack 327680 > pagetables 2224128 > percpu 0 > sock 4096 > vmalloc 0 > shmem 0 > zswap 0 > zswapped 0 > file_mapped 112041984 > file_dirty 1181028352 > file_writeback 2686976 > swapcached 0 > anon_thp 44040192 > file_thp 0 > shmem_thp 0 > inactive_anon 350756864 > active_anon 36864 > inactive_file 3614003200 > active_file 3888070656 > unevictable 0 > slab_reclaimable 143692600 > slab_unreclaimable 545120 > slab 144237720 > workingset_refault_anon 0 > workingset_refault_file 2318 > workingset_activate_anon 0 > workingset_activate_file 2318 > workingset_restore_anon 0 > workingset_restore_file 0 > workingset_nodereclaim 0 > pgfault 334152 > pgmajfault 1238 > pgrefill 3400 > pgscan 819608 > pgsteal 791005 > pgactivate 949122 > pgdeactivate 1694 > pglazyfree 0 > pglazyfreed 0 > zswpin 0 > zswpout 0 > thp_fault_alloc 709 > thp_collapse_alloc 0 > > So it basically renders UDP inoperable because of disk cache. I hope > this is not the intended behavior. Naturally booting with > cgroup.memory=nosocket solves this issue. This is most likely a regression caused by this patch: commit 4b1327be9fe57443295ae86fe0fcf24a18469e9f Author: Wei Wang <weiwan@xxxxxxxxxx> Date: Tue Aug 17 12:40:03 2021 -0700 net-memcg: pass in gfp_t mask to mem_cgroup_charge_skmem() Add gfp_t mask as an input parameter to mem_cgroup_charge_skmem(), to give more control to the networking stack and enable it to change memcg charging behavior. In the future, the networking stack may decide to avoid oom-kills when fallbacks are more appropriate. One behavior change in mem_cgroup_charge_skmem() by this patch is to avoid force charging by default and let the caller decide when and if force charging is needed through the presence or absence of __GFP_NOFAIL. Signed-off-by: Wei Wang <weiwan@xxxxxxxxxx> Reviewed-by: Shakeel Butt <shakeelb@xxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> We never used to fail these allocations. Cgroups don't have a kswapd-style watermark reclaimer, so the network relied on force-charging and leaving reclaim to allocations that can block. Now it seems network packets could just fail indefinitely. The changelog is a bit terse given how drastic the behavior change is. Wei, Shakeel, can you fill in why this was changed? Can we revert this for the time being?