On Fri, Jun 03, 2022 at 07:19:43AM +0300, Vasily Averin wrote: > __register_pernet_operations() executes init hook of registered > pernet_operation structure in all existing net namespaces. > > Typically, these hooks are called by a process associated with > the specified net namespace, and all __GFP_ACCOUNT marked > allocation are accounted for corresponding container/memcg. > > However __register_pernet_operations() calls the hooks in the same > context, and as a result all marked allocations are accounted > to one memcg for all processed net namespaces. > > This patch adjusts active memcg for each net namespace and helps > to account memory allocated inside ops_init() into the proper memcg. > > Signed-off-by: Vasily Averin <vvs@xxxxxxxxxx> > Acked-by: Roman Gushchin <roman.gushchin@xxxxxxxxx> > Acked-by: Shakeel Butt <shakeelb@xxxxxxxxxx> > --- > v6: re-based to current upstream (v5.18-11267-gb00ed48bb0a7) Hello! I'm unable to boot my sparc64 VM anymore (5.19 still boots, 6.0-rc1 does not), bisected up to this patch, mator@ttip:~/linux-2.6$ git bisect bad 1d0403d20f6c281cb3d14c5f1db5317caeec48e9 is the first bad commit commit 1d0403d20f6c281cb3d14c5f1db5317caeec48e9 Author: Vasily Averin <vvs@xxxxxxxxxx> Date: Fri Jun 3 07:19:43 2022 +0300 net: set proper memcg for net_init hooks allocations __register_pernet_operations() executes init hook of registered pernet_operation structure in all existing net namespaces. Typically, these hooks are called by a process associated with the specified net namespace, and all __GFP_ACCOUNT marked allocation are accounted for corresponding container/memcg. However __register_pernet_operations() calls the hooks in the same context, and as a result all marked allocations are accounted to one memcg for all processed net namespaces. This patch adjusts active memcg for each net namespace and helps to account memory allocated inside ops_init() into the proper memcg. Link: https://lkml.kernel.org/r/f9394752-e272-9bf9-645f-a18c56d1c4ec@xxxxxxxxxx Signed-off-by: Vasily Averin <vvs@xxxxxxxxxx> Acked-by: Roman Gushchin <roman.gushchin@xxxxxxxxx> Acked-by: Shakeel Butt <shakeelb@xxxxxxxxxx> Cc: Michal Koutný <mkoutny@xxxxxxxx> Cc: Vlastimil Babka <vbabka@xxxxxxx> Cc: Michal Hocko <mhocko@xxxxxxxx> Cc: Florian Westphal <fw@xxxxxxxxx> Cc: David S. Miller <davem@xxxxxxxxxxxxx> Cc: Jakub Kicinski <kuba@xxxxxxxxxx> Cc: Paolo Abeni <pabeni@xxxxxxxxxx> Cc: Eric Dumazet <edumazet@xxxxxxxxxx> Cc: Johannes Weiner <hannes@xxxxxxxxxxx> Cc: Kefeng Wang <wangkefeng.wang@xxxxxxxxxx> Cc: Linux Kernel Functional Testing <lkft@xxxxxxxxxx> Cc: Muchun Song <songmuchun@xxxxxxxxxxxxx> Cc: Naresh Kamboju <naresh.kamboju@xxxxxxxxxx> Cc: Qian Cai <quic_qiancai@xxxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> include/linux/memcontrol.h | 47 +++++++++++++++++++++++++++++++++++++++++++++- net/core/net_namespace.c | 7 +++++++ 2 files changed, 53 insertions(+), 1 deletion(-) getting the following kernel OOPS: [ 0.000010] PROMLIB: Sun IEEE Boot Prom 'OBP 4.38.17 2019/01/25 08:22' [ 0.000028] PROMLIB: Root node compatible: sun4v [ 0.000070] Linux version 5.19.0-rc2-00025-g1d0403d20f6c (mator@ttip) (gcc (Debian 12.2.0-2) 12.2.0, GNU ld (GNU Binutils for Debian) 2.38.90.20220713) #376 SMP Sun Sep 18 02:22:43 MSK 2022 [ 0.000098] printk: debug: skip boot console de-registration. [ 0.000438] printk: bootconsole [earlyprom0] enabled [ 0.000491] ARCH: SUN4V [ 0.000534] Ethernet address: 00:14:4f:fa:06:f2 [ 0.000583] MM: PAGE_OFFSET is 0xfff8000000000000 (max_phys_bits == 47) [ 0.000644] MM: VMALLOC [0x0000000100000000 --> 0x0006000000000000] [ 0.000704] MM: VMEMMAP [0x0006000000000000 --> 0x000c000000000000] [ 0.014651] Kernel: Using 5 locked TLB entries for main kernel image. [ 0.014719] Remapping the kernel... [ 0.014750] done. [ 0.033774] OF stdout device is: /virtual-devices@100/console@1 [ 0.033838] PROM: Built device tree with 67601 bytes of memory. [ 0.033896] MDESC: Size is 24208 bytes. [ 0.033989] PLATFORM: banner-name [SPARC T5-2] [ 0.034034] PLATFORM: name [ORCL,SPARC-T5-2] [ 0.034076] PLATFORM: hostid [84fa06f2] [ 0.034113] PLATFORM: serial# [0035260e] [ 0.034154] PLATFORM: stick-frequency [3b9aca00] [ 0.034196] PLATFORM: mac-address [144ffa06f2] [ 0.034238] PLATFORM: watchdog-resolution [1000 ms] [ 0.034284] PLATFORM: watchdog-max-timeout [31536000000 ms] [ 0.034335] PLATFORM: max-cpus [1024] [ 0.034419] Top of RAM: 0x42f948000, Total RAM: 0x3ff3a0000 [ 0.034474] Memory hole size: 773MB [ 0.036430] Allocated 24576 bytes for kernel page tables. [ 0.036506] Zone ranges: [ 0.036541] Normal [mem 0x0000000030400000-0x000000042f947fff] [ 0.036602] Movable zone start for each node [ 0.036645] Early memory node ranges [ 0.036679] node 0: [mem 0x0000000030400000-0x000000006febffff] [ 0.036738] node 0: [mem 0x000000006ff40000-0x000000006ff65fff] [ 0.036796] node 0: [mem 0x0000000070000000-0x000000042f8b1fff] [ 0.036854] node 0: [mem 0x000000042f940000-0x000000042f947fff] [ 0.036912] Initmem setup node 0 [mem 0x0000000030400000-0x000000042f947fff] [ 0.046980] On node 0, zone Normal: 98816 pages in unavailable ranges [ 0.047007] On node 0, zone Normal: 64 pages in unavailable ranges [ 0.048447] On node 0, zone Normal: 77 pages in unavailable ranges [ 0.048516] On node 0, zone Normal: 71 pages in unavailable ranges [ 0.050336] On node 0, zone Normal: 33628 pages in unavailable ranges [ 0.050400] Booting Linux... [ 0.050500] CPU CAPS: [flush,stbar,swap,muldiv,v9,blkinit,n2,mul32] [ 0.050581] CPU CAPS: [div32,v8plus,popc,vis,vis2,ASIBlkInit,fmaf,vis3] [ 0.050663] CPU CAPS: [hpc,ima,pause,cbcond,aes,des,kasumi,camellia] [ 0.050744] CPU CAPS: [md5,sha1,sha256,sha512,mpmul,montmul,montsqr,crc32c] [ 0.093786] percpu: Embedded 18 pages/cpu s105824 r8192 d33440 u262144 [ 0.095225] SUN4V: Mondo queue sizes [cpu(131072) dev(16384) r(8192) nr(256)] [ 0.095510] Built 1 zonelists, mobility grouping on. Total pages: 2077148 [ 0.095587] Kernel command line: BOOT_IMAGE=/vmlinux-5.19.0-rc2-00025-g1d0403d20f6c root=/dev/vdiska2 ro keep_bootcon [ 0.095745] Unknown kernel command line parameters "BOOT_IMAGE=/vmlinux-5.19.0-rc2-00025-g1d0403d20f6c", will be passed to user space. [ 0.095851] printk: log_buf_len individual max cpu contribution: 4096 bytes [ 0.095914] printk: log_buf_len total cpu_extra contributions: 1044480 bytes [ 0.095973] printk: log_buf_len min size: 131072 bytes [ 0.097772] printk: log_buf_len: 2097152 bytes [ 0.097818] printk: early log buf free: 126264(96%) [ 0.099466] Dentry cache hash table entries: 2097152 (order: 11, 16777216 bytes, linear) [ 0.100365] Inode-cache hash table entries: 1048576 (order: 10, 8388608 bytes, linear) [ 0.100439] Sorting __ex_table... [ 0.100692] mem auto-init: stack:off, heap alloc:off, heap free:off [ 0.105101] Memory: 1259512K/16764544K available (8962K kernel code, 1702K rwdata, 3048K rodata, 632K init, 3160K bss, 289008K reserved, 0K cma-reserved) [ 0.108565] SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=256, Nodes=1 [ 0.109364] ftrace: allocating 27588 entries in 54 pages [ 0.120238] ftrace: allocated 54 pages with 4 groups [ 0.120513] trace event string verifier disabled [ 0.124589] rcu: Hierarchical RCU implementation. [ 0.124642] rcu: RCU debug extended QS entry/exit. [ 0.124689] Rude variant of Tasks RCU enabled. [ 0.124733] Tracing variant of Tasks RCU enabled. [ 0.124778] rcu: RCU calculated value of scheduler-enlistment delay is 26 jiffies. [ 0.131351] NR_IRQS: 2048, nr_irqs: 2048, preallocated irqs: 1 [ 0.131438] SUN4V: Using IRQ API major 3, cookie only virqs enabled [ 0.135353] rcu: srcu_init: Setting srcu_struct sizes to big. [ 0.135477] clocksource: stick: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns [ 0.135579] clocksource: mult[800000] shift[23] [ 0.135626] clockevent: mult[80000000] shift[31] [ 0.136279] Console: colour dummy device 80x25 [ 0.136333] printk: console [tty0] enabled [ 0.136393] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar [ 0.136482] ... MAX_LOCKDEP_SUBCLASSES: 8 [ 0.136536] ... MAX_LOCK_DEPTH: 48 [ 0.136589] ... MAX_LOCKDEP_KEYS: 8192 [ 0.136645] ... CLASSHASH_SIZE: 4096 [ 0.136699] ... MAX_LOCKDEP_ENTRIES: 16384 [ 0.136756] ... MAX_LOCKDEP_CHAINS: 32768 [ 0.136811] ... CHAINHASH_SIZE: 16384 [ 0.136868] memory used by lock dependency info: 2603 kB [ 0.136933] per task-struct memory footprint: 1920 bytes [ 0.215908] Calibrating delay using timer specific routine.. 2007.88 BogoMIPS (lpj=4015778) [ 0.216049] pid_max: default: 262144 minimum: 2048 [ 0.216772] LSM: Security Framework initializing [ 0.217017] Unable to handle kernel paging request at virtual address 000612000002e000 [ 0.217116] tsk->{mm,active_mm}->context = 0000000000000000 [ 0.217184] tsk->{mm,active_mm}->pgd = fff8000070002000 [ 0.217247] \|/ ____ \|/ [ 0.217247] "@'/ .. \`@" [ 0.217247] /_| \__/ |_\ [ 0.217247] \__U_/ [ 0.217406] swapper/0(0): Oops [#1] [ 0.217458] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0-rc2-00025-g1d0403d20f6c #376 [ 0.217559] TSTATE: 0000009180001607 TPC: 00000000006c9118 TNPC: 00000000006c911c Y: df1f6831 Not tainted [ 0.217673] TPC: <mem_cgroup_from_obj+0x78/0x120> [ 0.217742] g0: 0000000000000000 g1: 0000004000000a89 g2: 0006000000000000 g3: 54256f3ea00db3c0 [ 0.217843] g4: 0000000000fdf680 g5: fff800042960e000 g6: 0000000000fc0000 g7: 0000000000000002 [ 0.217943] o0: 000612000002f688 o1: 0000000000fdffa0 o2: 22645555e843a019 o3: 24f02a9c57a00000 [ 0.218043] o4: 000000000000000d o5: 9b8bf183d547acad sp: 0000000000fc3191 ret_pc: 00000000006c90c8 [ 0.218145] RPC: <mem_cgroup_from_obj+0x28/0x120> [ 0.218207] l0: 00000000011f31c0 l1: 0000000000000000 l2: 0000000000000000 l3: ffffffffffffffff [ 0.218309] l4: ffffffff0000003c l5: 00000000014e3800 l6: 0000000000000000 l7: 0000000000fdac00 [ 0.218409] i0: 0000000001512d80 i1: 0000000000000000 i2: 0000000000000000 i3: 0000000000000002 [ 0.218509] i4: 00000000011f31c0 i5: 0000000000000000 i6: 0000000000fc3241 i7: 0000000000ae012c [ 0.218609] I7: <__register_pernet_operations+0xcc/0x420> [ 0.218681] Call Trace: [ 0.218718] [<0000000000ae012c>] __register_pernet_operations+0xcc/0x420 [ 0.218800] [<0000000000ae04e4>] register_pernet_operations+0x64/0xa0 [ 0.218878] [<0000000000ae053c>] register_pernet_subsys+0x1c/0x40 [ 0.218955] [<0000000001199010>] net_ns_init+0xe8/0x148 [ 0.219028] [<0000000001170ed4>] start_kernel+0x5e0/0x660 [ 0.219096] [<0000000001173e28>] start_early_boot+0x2a0/0x2b0 [ 0.219169] [<0000000000cb6fe0>] tlb_fixup_done+0x4c/0x6c [ 0.219240] [<0000000000027414>] 0x27414 [ 0.219293] Disabling lock debugging due to kernel taint [ 0.219345] Caller[0000000000ae012c]: __register_pernet_operations+0xcc/0x420 [ 0.220423] Caller[0000000000ae04e4]: register_pernet_operations+0x64/0xa0 [ 0.220490] Caller[0000000000ae053c]: register_pernet_subsys+0x1c/0x40 [ 0.220551] Caller[0000000001199010]: net_ns_init+0xe8/0x148 [ 0.220608] Caller[0000000001170ed4]: start_kernel+0x5e0/0x660 [ 0.220664] Caller[0000000001173e28]: start_early_boot+0x2a0/0x2b0 [ 0.220723] Caller[0000000000cb6fe0]: tlb_fixup_done+0x4c/0x6c [ 0.220780] Caller[0000000000027414]: 0x27414 [ 0.220823] Instruction DUMP: [ 0.220825] 90020001 [ 0.220858] 912a3003 [ 0.220886] 90020002 [ 0.220912] <c25a2008> [ 0.220939] 84086001 [ 0.220967] 82007fff [ 0.220993] 83788408 [ 0.221020] 90100001 [ 0.221047] c25a0000 [ 0.221074] [ 0.221120] Kernel panic - not syncing: Attempted to kill the idle task! [ 0.221183] Unable to handle kernel NULL pointer dereference [ 0.221237] tsk->{mm,active_mm}->context = 0000000000000000 [ 0.221287] tsk->{mm,active_mm}->pgd = fff8000070002000 [ 0.221335] \|/ ____ \|/ [ 0.221335] "@'/ .. \`@" [ 0.221335] /_| \__/ |_\ [ 0.221335] \__U_/ [ 0.221457] swapper/0(0): Oops [#2] [ 0.221494] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G D 5.19.0-rc2-00025-g1d0403d20f6c #376 [ 0.221580] TSTATE: 0000004480e01607 TPC: 0000000000a64030 TNPC: 0000000000a64034 Y: 000008a3 Tainted: G D [ 0.221678] TPC: <sunhv_migrate_hvcons_irq+0x30/0x60> [ 0.221731] g0: 00000000014e3800 g1: 0000000000000020 g2: 0000000000000000 g3: 000000000000009d [ 0.221808] g4: 0000000000fdf680 g5: fff800042960e000 g6: 0000000000fc0000 g7: 0000000000000001 [ 0.222888] o0: 000000000000003c o1: 0000000000cc9400 o2: 0000000000000000 o3: 0000000000ece2a0 [ 0.222966] o4: 6c65207461736b21 o5: 0000000000000000 sp: 0000000000fc2b21 ret_pc: 00000000004dbfdc [ 0.223046] RPC: <vprintk+0x5c/0x80> [ 0.223087] l0: 0000000001228e40 l1: 0000000000000020 l2: 0000000000eceb78 l3: 0000000f477791df [ 0.223167] l4: f477792d02f140eb l5: 00000000014e3800 l6: 0000000000000000 l7: 0000000000000001 [ 0.223243] i0: 0000000000000000 i1: 0000000000fc3508 i2: 0000000000eceb78 i3: 0000000000fc35c8 [ 0.223320] i4: 0000000000a1c888 i5: 0000000001229220 i6: 0000000000fc2bd1 i7: 0000000000440a1c [ 0.223397] I7: <smp_send_stop+0x3c/0x100> [ 0.223443] Call Trace: [ 0.223470] [<0000000000440a1c>] smp_send_stop+0x3c/0x100 [ 0.223522] [<0000000000cac4a0>] panic+0x104/0x374 [ 0.223572] [<000000000046a4fc>] make_task_dead+0x5c/0xe0 [ 0.223629] [<0000000000cab660>] die_if_kernel+0x258/0x264 [ 0.223681] [<0000000000cc3624>] unhandled_fault+0x98/0xb4 [ 0.223737] [<0000000000cc3e54>] do_sparc64_fault+0x814/0xa00 [ 0.223792] [<0000000000407714>] sparc64_realfault_common+0x10/0x20 [ 0.223858] [<00000000006c9118>] mem_cgroup_from_obj+0x78/0x120 [ 0.223914] [<0000000000ae012c>] __register_pernet_operations+0xcc/0x420 [ 0.223976] [<0000000000ae04e4>] register_pernet_operations+0x64/0xa0 [ 0.224038] [<0000000000ae053c>] register_pernet_subsys+0x1c/0x40 [ 0.224094] [<0000000001199010>] net_ns_init+0xe8/0x148 [ 0.224147] [<0000000001170ed4>] start_kernel+0x5e0/0x660 [ 0.224198] [<0000000001173e28>] start_early_boot+0x2a0/0x2b0 [ 0.224254] [<0000000000cb6fe0>] tlb_fixup_done+0x4c/0x6c [ 0.225308] [<0000000000027414>] 0x27414 [ 0.225349] Caller[0000000000440a1c]: smp_send_stop+0x3c/0x100 [ 0.225406] Caller[0000000000cac4a0]: panic+0x104/0x374 [ 0.225456] Caller[000000000046a4fc]: make_task_dead+0x5c/0xe0 [ 0.225512] Caller[0000000000cab660]: die_if_kernel+0x258/0x264 [ 0.225567] Caller[0000000000cc3624]: unhandled_fault+0x98/0xb4 [ 0.225624] Caller[0000000000cc3e54]: do_sparc64_fault+0x814/0xa00 [ 0.225685] Caller[0000000000407714]: sparc64_realfault_common+0x10/0x20 [ 0.225747] Caller[00000000006c90c8]: mem_cgroup_from_obj+0x28/0x120 [ 0.225806] Caller[0000000000ae012c]: __register_pernet_operations+0xcc/0x420 [ 0.225875] Caller[0000000000ae04e4]: register_pernet_operations+0x64/0xa0 [ 0.225940] Caller[0000000000ae053c]: register_pernet_subsys+0x1c/0x40 [ 0.226001] Caller[0000000001199010]: net_ns_init+0xe8/0x148 [ 0.226058] Caller[0000000001170ed4]: start_kernel+0x5e0/0x660 [ 0.226113] Caller[0000000001173e28]: start_early_boot+0x2a0/0x2b0 [ 0.226172] Caller[0000000000cb6fe0]: tlb_fixup_done+0x4c/0x6c [ 0.226228] Caller[0000000000027414]: 0x27414 [ 0.226271] Instruction DUMP: [ 0.226273] 83287005 [ 0.226305] 13003325 [ 0.226333] 82204018 [ 0.226359] <d000a0d8> [ 0.226385] 92126358 [ 0.226412] 7fe9f0e2 [ 0.226439] 92024001 [ 0.226465] 81cfe008 [ 0.226492] 01000000 [ 0.226519] [ 0.226562] Kernel panic - not syncing: Attempted to kill the idle task! [ 0.226626] Unable to handle kernel NULL pointer dereference [ 0.226678] tsk->{mm,active_mm}->context = 0000000000000000 [ 0.226729] tsk->{mm,active_mm}->pgd = fff8000070002000