+Tejun Heo On Wed, May 12, 2021 at 3:48 AM NOMURA JUNICHI(野村 淳一) <junichi.nomura@xxxxxxx> wrote: > > v5.13-rc1 sometimes causes NULL pointer dereference during kdump, where > memcg is disabled with "cgroup_disable=memory" boot option. > I haven't seen this problem with v5.12, so it looks like regression. > > [ 73.199590] BUG: kernel NULL pointer dereference, address: 0000000000000000 > [ 73.206593] #PF: supervisor write access in kernel mode > [ 73.211845] #PF: error_code(0x0002) - not-present page > [ 73.217010] PGD 0 P4D 0 > [ 73.219556] Oops: 0002 [#1] SMP NOPTI > [ 73.223236] CPU: 0 PID: 95 Comm: kswapd0 Tainted: G I 5.13.0-rc1 #1 > [ 73.239418] RIP: 0010:do_shrink_slab+0x85/0x2d0 > [ 73.243977] Code: 49 63 44 24 04 be 00 00 00 00 49 8b 4c 24 18 f6 c2 02 48 0f 44 c6 48 85 c9 74 09 83 e2 04 0f 85 19 02 00 00 49 8b 4f 38 31 d2 <48> 87 14 c1 48 89 55 b8 41 8b 77 18 4c 89 f0 85 f6 0f 84 82 01 00 > [ 73.262856] RSP: 0018:ffffc900001abc18 EFLAGS: 00010246 > [ 73.268108] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 > [ 73.275281] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000064 > [ 73.282454] RBP: ffffc900001abc70 R08: 28f5c28f5c28f5c3 R09: 0000000000000000 > [ 73.289628] R10: 0000000000000000 R11: 0000000000000004 R12: ffffc900001abca0 > [ 73.296800] R13: 0000000000000400 R14: 0000000000000002 R15: ffff88805344bc10 > [ 73.303972] FS: 0000000000000000(0000) GS:ffff888072c00000(0000) knlGS:0000000000000000 > [ 73.312108] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 73.317883] CR2: 0000000000000000 CR3: 000000005cf68004 CR4: 00000000007706b0 > [ 73.325055] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 73.332227] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 73.339400] PKRU: 55555554 > [ 73.342117] Call Trace: > [ 73.344576] shrink_slab+0xa9/0x2b0 > [ 73.348083] ? __update_load_avg_se+0x298/0x320 > [ 73.352640] shrink_node+0x248/0x6f0 > [ 73.356234] balance_pgdat+0x303/0x5f0 > [ 73.360002] kswapd+0x20b/0x390 > [ 73.363157] ? finish_wait+0x80/0x80 > [ 73.366752] ? balance_pgdat+0x5f0/0x5f0 > [ 73.370693] kthread+0x124/0x140 > [ 73.373937] ? kthread_park+0x90/0x90 > [ 73.377617] ret_from_fork+0x1f/0x30 > [ 73.381215] Modules linked in: xfs libcrc32c sd_mod t10_pi sr_mod cdrom sg crc32c_intel ahci libahci libata smartpqi scsi_transport_sas overlay squashfs loop > [ 73.395386] CR2: 0000000000000000 > [ 73.398716] ---[ end trace 9752d71309d33c00 ]--- > > The code around do_shrink_slab+0x85 is: > 0xffffffff9d094925 <do_shrink_slab+0x65>: mov 0x18(%r12),%rcx > 0xffffffff9d09492a <do_shrink_slab+0x6a>: test $0x2,%dl > 0xffffffff9d09492d <do_shrink_slab+0x6d>: cmove %rsi,%rax > 0xffffffff9d094931 <do_shrink_slab+0x71>: test %rcx,%rcx > 0xffffffff9d094934 <do_shrink_slab+0x74>: je 0xffffffff9d09493f <do_shrink_slab+0x7f> > 0xffffffff9d094936 <do_shrink_slab+0x76>: and $0x4,%edx > 0xffffffff9d094939 <do_shrink_slab+0x79>: jne 0xffffffff9d094b58 <do_shrink_slab+0x298> > 0xffffffff9d09493f <do_shrink_slab+0x7f>: mov 0x38(%r15),%rcx > 0xffffffff9d094943 <do_shrink_slab+0x83>: xor %edx,%edx > 0xffffffff9d094945 <do_shrink_slab+0x85>: xchg %rdx,(%rcx,%rax,8) > > The NULL dereference occurred at here in in-lined xchg_nr_deferred(): > > return atomic_long_xchg(&shrinker->nr_deferred[nid], 0); > > that means "shrinker->nr_deferred" was NULL. > > Though I haven't fully bisected between v5.12 and v5.13-rc1, I can reproduce > the problem with this commit: > > 476b30a0949a mm: vmscan: don't need allocate shrinker->nr_deferred for memcg aware shrinkers > > but not with this previous commit: > > 867508304685 mm: vmscan: use per memcg nr_deferred of shrinker > > With the commit 476b30a0949a, if a memcg-aware shrinker is registered before > cgroup_init(), shrinker->nr_deferred is NULL. However xchg_nr_deferred() > tries to use it as memcg is turned off via "cgroup_disable=memory". > > Any thoughts? Is there a way to find the call chain of "memcg-aware shrinker is registered before cgroup_init()"? Irrespective I think we can revert a3e72739b7a7e ("cgroup: fix too early usage of static_branch_disable()") as 6041186a3258 ("init: initialize jump labels before command line option parsing") has moved the initialization of jump labels before command line parsing.