On Thu, Feb 1, 2024 at 6:26 AM Michal Koutný <mkoutny@xxxxxxxx> wrote: > > On Fri, Jan 26, 2024 at 09:19:25PM +0000, "T.J. Mercier" <tjmercier@xxxxxxxxxx> wrote: > > The root memcg is onlined even when memcg is disabled. When it's onlined > > a 2 second periodic stat flush is started, but no stat flushing is > > required when memcg is disabled because there can be no child memcgs. > > Most calls to flush memcg stats are avoided when memcg is disabled as a > > result of the mem_cgroup_disabled check added in 7d7ef0a4686a > > ("mm: memcg: restore subtree stats flushing"), but the periodic flushing > > started in mem_cgroup_css_online is not. Skip it. > > Have you tried > --- a/kernel/cgroup/cgroup.c > +++ b/kernel/cgroup/cgroup.c > @@ -6099,6 +6099,9 @@ int __init cgroup_init(void) > cgroup_unlock(); > > for_each_subsys(ss, ssid) { > + if (!cgroup_ssid_enabled(ssid)) > + continue; > + > if (ss->early_init) { > struct cgroup_subsys_state *css = > init_css_set.subsys[ss->id]; > @@ -6118,9 +6121,6 @@ int __init cgroup_init(void) > * disabled flag and cftype registration needs kmalloc, > * both of which aren't available during early_init. > */ > - if (!cgroup_ssid_enabled(ssid)) > - continue; > - > if (cgroup1_ssid_disabled(ssid)) > pr_info("Disabling %s control group subsystem in v1 mounts\n", > ss->legacy_name); > ? > I'm asking about a try because I'm not sure whether this does not blow > up due to something missing. But I think disabled controllers would not > need to be (root-)onlined at all. > > Thanks, > Michal Hi Michal, It does blow up, but not how I was expecting. There's a null pointer dereference inside find_css_set when trying to get a css pointer for the memory controller, I think because the allocation in cgroup_init_subsys is skipped: [ 9.591766] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 9.591909] #PF: supervisor read access in kernel mode [ 9.592023] #PF: error_code(0x0000) - not-present page [ 9.592138] PGD 0 P4D 0 [ 9.592199] Oops: 0000 [#1] PREEMPT SMP PTI [ 9.592289] CPU: 1 PID: 1 Comm: init Tainted: G E 6.7.0-mainline-maybe-dirty #1 [ 9.592490] Hardware name: emulation qemu-x86/qemu-x86, BIOS 2023.07-rc6-gb8315a3184b2-ab11347395 07/01/2023 [ 9.592731] RIP: 0010:find_css_set+0x5c3/0x7a0 [ 9.592850] Code: 20 69 cf b2 4e 8b 3c 33 48 c7 c7 1d 3b 41 b2 4c 89 fe e8 10 b0 f7 00 49 8b b4 24 a0 00 00 00 4e 8d 24 2b 49 81 c4 b0 fc ff ff <49> 8b 0f 4c 01 e9 48 c7 c7 c4 c0 46 b2 4c 89 e2 e8 e8 af f7 00 49 [ 9.593251] RSP: 0018:ffffb6218000bb90 EFLAGS: 00010087 [ 9.593370] RAX: 0000000000000021 RBX: ffff99181044a200 RCX: 4f5e789f089a0c00 [ 9.593554] RDX: ffffb6218000ba50 RSI: ffffffffb2448284 RDI: ffffffffb2c91950 [ 9.593735] RBP: ffffb6218000bc28 R08: 0000000000000fff R09: ffffffffb2c79950 [ 9.593909] R10: 0000000000002ffd R11: 0000000000000004 R12: ffff99181044a2d8 [ 9.594102] R13: 0000000000000428 R14: 0000000000000020 R15: 0000000000000000 [ 9.594291] FS: 00007f3d2f986fc8(0000) GS:ffff99182bd00000(0000) knlGS:0000000000000000 [ 9.594467] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 9.594610] CR2: 0000000000000000 CR3: 00000001001d8001 CR4: 0000000000370eb0 [ 9.594818] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 9.595007] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 9.595178] Call Trace: [ 9.595237] <TASK> [ 9.595297] ? __die_body+0x5e/0xb0 [ 9.595392] ? __die+0x9e/0xb0 [ 9.595481] ? page_fault_oops+0x35f/0x3d0 [ 9.595574] ? do_user_addr_fault+0x6ab/0x780 [ 9.595690] ? prb_read_valid+0x28/0x60 [ 9.595782] ? exc_page_fault+0x83/0x220 [ 9.595874] ? asm_exc_page_fault+0x2b/0x30 [ 9.595966] ? find_css_set+0x5c3/0x7a0 [ 9.596059] cgroup_migrate_prepare_dst+0x75/0x2b0 [ 9.596194] cgroup_attach_task+0x293/0x450 [ 9.596305] ? cgroup_attach_task+0xb6/0x450 [ 9.596449] __cgroup_procs_write+0xef/0x1a0 [ 9.596589] cgroup_procs_write+0x16/0x30 [ 9.596733] cgroup_file_write+0x9d/0x260 [ 9.596840] kernfs_fop_write_iter+0x145/0x1e0 [ 9.596981] vfs_write+0x276/0x2e0 [ 9.597092] ksys_write+0x73/0xe0 [ 9.597198] __x64_sys_write+0x1a/0x30 [ 9.597303] do_syscall_64+0x5a/0x100 [ 9.597430] entry_SYSCALL_64_after_hwframe+0x6e/0x76 But there are also calls to mem_cgroup_css_from_folio that I think would cause a different null pointer deref even if we had the css but no root_mem_cgroup. There seems to be an assumption that we'll have a memcg to charge against even when the controller is disabled. To me it looks like that's to simplify the possible combinations of CONFIG_MEMCG and memcg being boot-time disabled or not. Best, T.J.