Hi Nish, On Thu, Nov 01, 2018 at 12:06:44PM -0700, Tejun Heo wrote: > ---------- Forwarded message --------- > From: Nishanth Aravamudan <naravamudan@xxxxxxxxxxxxxxxx> > Date: Thu, Nov 1, 2018 at 3:03 PM > Subject: Kernel panic when enabling cgroup2 io controller at runtime > To: Tejun Heo <tj@xxxxxxxxxx>, Li Zefan <lizefan@xxxxxxxxxx>, Johannes > Weiner <hannes@xxxxxxxxxxx> > Cc: <cgroups@xxxxxxxxxxxxxxx> > > > Hi, > > tl;dr: I see a kernel NULL pointer dereference with Linus' master > (7c6c54b5) when enabling the IO cgroup2 controller at runtime. Is this > PEBKAC and if so what config option am I missing? I don't think you're missing something. I ran a patch series here that changed blkcg to do more accurate accounting. However, it seems that I didn't correctly handle all the cases. My guess is this is what caused the oops. It has been reverted in b5f2954d30c7. The original patch series is [1]. > > [ 1015.243027] BUG: unable to handle kernel NULL pointer dereference at > 0000000000000000 > [ 1015.250913] PGD 0 P4D 0 > [ 1015.253480] Oops: 0000 [#1] SMP PTI > [ 1015.256997] CPU: 64 PID: 4129 Comm: monit Kdump: loaded Not tainted > 4.19.0+ #3 > [ 1015.264231] Hardware name: Dell Inc. PowerEdge R640/0W23H8, BIOS 1.2.11 > 10/19/2017 > [ 1015.271819] RIP: 0010:get_request+0x133/0x8b0 > [ 1015.276184] Code: ff ff ff 41 f7 d4 48 89 85 78 ff ff ff 4c 01 f8 41 83 > c4 02 48 89 45 90 44 89 a5 74 ff ff ff 4d 8b 27 48 85 db 49 8b 44 24 18 > <48> 8b 00 48 89 855 > [ 1015.294963] RSP: 0018:ffffa4455abef9c0 EFLAGS: 00010086 > [ 1015.300196] RAX: 0000000000000000 RBX: ffff92cbf02ce900 RCX: > 0000000000000001 > [ 1015.307337] RDX: 000031193f839fe8 RSI: 0000000000000800 RDI: > ffff92cbeaaf8080 > [ 1015.314480] RBP: ffffa4455abefa68 R08: 0000000000600000 R09: > ffff92cbe5ee89b0 > [ 1015.321622] R10: ffffa4455abefb28 R11: 0000000000001000 R12: > ffff92cbe5248000 > [ 1015.328763] R13: 0000000000000001 R14: 0000000000000040 R15: > ffff92cbeaaf8040 > [ 1015.335904] FS: 00007f38b114b740(0000) GS:ffff92cc00e00000(0000) > knlGS:0000000000000000 > [ 1015.344005] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1015.349761] CR2: 0000000000000000 CR3: 0000005e83002001 CR4: > 00000000007606e0 > [ 1015.356901] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [ 1015.364042] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: > 0000000000000400 > [ 1015.371182] PKRU: 55555554 > [ 1015.373895] Call Trace: > [ 1015.376352] ? wait_woken+0x80/0x80 > [ 1015.379852] blk_queue_bio+0x131/0x460 > [ 1015.383611] generic_make_request+0x1a4/0x410 > [ 1015.387983] raid10_unplug+0x112/0x1b0 [raid10] > [ 1015.392520] ? raid10_unplug+0x112/0x1b0 [raid10] > [ 1015.397234] blk_flush_plug_list+0xce/0x250 > [ 1015.401430] blk_finish_plug+0x2c/0x40 > [ 1015.405191] ext4_writepages+0x635/0xe90 > [ 1015.409130] ? generic_perform_write+0x124/0x1b0 > [ 1015.413756] do_writepages+0x4b/0xe0 > [ 1015.417341] ? ext4_mark_inode_dirty+0x1d0/0x1d0 > [ 1015.421970] ? do_writepages+0x4b/0xe0 > [ 1015.425733] ? call_rcu+0x10/0x20 > [ 1015.429061] ? inode_switch_wbs+0x15d/0x190 > [ 1015.433253] __filemap_fdatawrite_range+0xc1/0x100 > [ 1015.438053] ? __filemap_fdatawrite_range+0xc1/0x100 > [ 1015.443029] file_write_and_wait_range+0x5a/0xb0 > [ 1015.447658] ext4_sync_file+0x111/0x3b0 > [ 1015.451505] vfs_fsync_range+0x48/0x80 > [ 1015.455284] ? __fget_light+0x54/0x60 > [ 1015.458966] do_fsync+0x3d/0x70 > [ 1015.462139] __x64_sys_fsync+0x14/0x20 > [ 1015.465900] do_syscall_64+0x5a/0x120 > [ 1015.469576] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [ 1015.475044] RIP: 0033:0x7f38afe86b07 > [ 1015.478985] Code: 00 00 0f 05 48 3d 00 f0 ff ff 77 3f f3 c3 0f 1f 44 00 > 00 53 89 fb 48 83 ec 10 e8 04 f5 ff ff 89 df 89 c2 b8 4a 00 00 00 0f 05 > <48> 3d 00 f0 ff ff4 > [ 1015.498501] RSP: 002b:00007fff53bc4140 EFLAGS: 00000293 ORIG_RAX: > 000000000000004a > [ 1015.506448] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: > 00007f38afe86b07 > [ 1015.513971] RDX: 0000000000000000 RSI: 00007fff53bc4170 RDI: > 0000000000000004 > [ 1015.521484] RBP: 00007fff53bc4170 R08: 0000000000000000 R09: > 000000000000000a > [ 1015.528991] R10: 00000000fffffff6 R11: 0000000000000293 R12: > 0000561e723e1b68 > [ 1015.536504] R13: 0000000000000000 R14: 00007fff53bc42b4 R15: > 0000000000000000 > [ 1015.544001] Modules linked in: ebtable_filter ebtables ip6table_filter > iptable_filter nbd openvswitch nsh nf_nat_ipv6 nf_nat_ipv4 nf_conncount > nf_nat bonding ip6tab > [ 1015.544039] raid1 raid10 ses enclosure scsi_transport_sas ib_uverbs > ib_core mlx5_core mgag200 i2c_algo_bit mlxfw ttm devlink drm_kms_helper > syscopyarea sysfillreci > [ 1015.654479] CR2: 0000000000000000 > [ 0.084151] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR > 38d is b0) > [ 0.472249] BUG: unable to handle kernel paging request at > 0000000000002088 > [ 0.473712] PGD 0 P4D 0 > [ 0.473712] Oops: 0000 [#1] SMP PTI > [ 0.473712] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.0+ #3 > [ 0.473712] Hardware name: Dell Inc. PowerEdge R640/0W23H8, BIOS 1.2.11 > 10/19/2017 > [ 0.473712] RIP: 0010:__alloc_pages_nodemask+0xdc/0x280 > [ 0.473712] Code: 00 00 44 89 fa 80 ca 80 83 f8 01 89 d8 44 0f 44 fa 48 > 8b 55 b0 c1 e8 08 83 e0 01 88 45 c8 48 89 f8 48 85 d2 0f 85 27 01 00 00 > <3b> 77 08 0f 82 1e7 > [ 0.473712] RSP: 0000:ffffb998000db7c8 EFLAGS: 00010246 > [ 0.473712] RAX: 0000000000002080 RBX: 00000000006012c0 RCX: > 0000000000000000 > [ 0.473712] RDX: 0000000000000000 RSI: 0000000000000002 RDI: > 0000000000002080 > [ 0.473712] RBP: ffffb998000db820 R08: 0000000000000000 R09: > 0000000000000000 > [ 0.473712] R10: ffffb998000db8a0 R11: 000000000000000f R12: > 0000000000000000 > [ 0.473712] R13: 0000000000000000 R14: 00000000006012c0 R15: > 0000000000000001 > [ 0.473712] FS: 0000000000000000(0000) GS:ffff95edefe00000(0000) > knlGS:0000000000000000 > [ 0.473712] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 0.473712] CR2: 0000000000002088 CR3: 000000002a00a001 CR4: > 00000000007606f0 > [ 0.473712] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [ 0.473712] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: > 0000000000000400 > [ 0.473712] PKRU: 00000000 > [ 0.473712] Call Trace: > [ 0.473712] new_slab+0xaa/0x710 > [ 0.473712] ___slab_alloc+0x37f/0x550 > [ 0.473712] ? acpi_ut_trace_ptr+0x2c/0x74 > [ 0.473712] ? alloc_desc+0x3c/0x220 > [ 0.473712] __slab_alloc+0x20/0x40 > [ 0.473712] ? __slab_alloc+0x20/0x40 > [ 0.473712] kmem_cache_alloc_node_trace+0xaf/0x200 > [ 0.473712] alloc_desc+0x3c/0x220 > [ 0.473712] __irq_alloc_descs+0x1c9/0x240 > [ 0.473712] irq_domain_alloc_descs+0x87/0xb0 > [ 0.473712] __irq_domain_alloc_irqs+0x1f2/0x310 > [ 0.473712] mp_map_pin_to_irq+0x299/0x2f0 > [ 0.473712] ? strstr+0x2c/0x70 > [ 0.473712] mp_map_gsi_to_irq+0xb5/0xe0 > [ 0.473712] acpi_register_gsi_ioapic+0x79/0x180 > [ 0.473712] acpi_register_gsi+0x15/0x20 > [ 0.473712] acpi_pci_irq_enable+0x124/0x2a0 > [ 0.473712] ? pci_read_config_word+0x23/0x40 > [ 0.473712] ? quirk_intel_mc_errata+0xd0/0xd0 > [ 0.473712] pcibios_enable_device+0x2e/0x40 > [ 0.473712] do_pci_enable_device+0x88/0x100 > [ 0.473712] pci_enable_device_flags+0xe8/0x130 > [ 0.473712] pci_enable_device+0x13/0x20 > [ 0.473712] pci_enable_bridge+0x52/0x90 > [ 0.473712] pci_enable_device_flags+0x91/0x130 > [ 0.473712] pci_enable_device_mem+0x13/0x20 > [ 0.473712] mellanox_check_broken_intx_masking+0x61/0x120 > [ 0.473712] pci_do_fixups+0xc9/0x120 > [ 0.473712] ? set_debug_rodata+0x17/0x17 > [ 0.473712] pci_apply_final_quirks+0x7a/0x127 > [ 0.473712] ? pci_proc_init+0x76/0x76 > [ 0.473712] do_one_initcall+0x4a/0x1c9 > [ 0.473712] kernel_init_freeable+0x21a/0x2c9 > [ 0.473712] ? rest_init+0xb0/0xb0 > [ 0.473712] kernel_init+0xe/0x110 > [ 0.473712] ret_from_fork+0x35/0x40 > [ 0.473712] Modules linked in: > [ 0.473712] CR2: 0000000000002088 > [ 0.473712] ---[ end trace ac0676b30797a2d2 ]--- > [ 0.473712] RIP: 0010:__alloc_pages_nodemask+0xdc/0x280 > [ 0.473712] Code: 00 00 44 89 fa 80 ca 80 83 f8 01 89 d8 44 0f 44 fa 48 > 8b 55 b0 c1 e8 08 83 e0 01 88 45 c8 48 89 f8 48 85 d2 0f 85 27 01 00 00 > <3b> 77 08 0f 82 1e7 > [ 0.473712] RSP: 0000:ffffb998000db7c8 EFLAGS: 00010246 > [ 0.473712] RAX: 0000000000002080 RBX: 00000000006012c0 RCX: > 0000000000000000 > [ 0.473712] RDX: 0000000000000000 RSI: 0000000000000002 RDI: > 0000000000002080 > [ 0.473712] RBP: ffffb998000db820 R08: 0000000000000000 R09: > 0000000000000000 > [ 0.473712] R10: ffffb998000db8a0 R11: 000000000000000f R12: > 0000000000000000 > [ 0.473712] R13: 0000000000000000 R14: 00000000006012c0 R15: > 0000000000000001 > [ 0.473712] FS: 0000000000000000(0000) GS:ffff95edefe00000(0000) > knlGS:0000000000000000 > [ 0.473712] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 0.473712] CR2: 0000000000002088 CR3: 000000002a00a001 CR4: > 00000000007606f0 > [ 0.473712] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [ 0.473712] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: > 0000000000000400 > [ 0.473712] PKRU: 00000000 > [ 0.862647] Kernel panic - not syncing: Attempted to kill init! > exitcode=0x00000009 > [ 0.866614] ---[ end Kernel panic - not syncing: Attempted to kill init! > exitcode=0x00000009 ]--- > > Longer details: I saw the panic originally when testing the recently > submitted cpuset cgroup2 controller on a system with Ubuntu 18.04 > userspace. The only difference is that "cpuset" is in the list of > available controllers, so I was doing "echo +io +cpuset" below. I am > booting with 'cgroup_no_v1=all systemd.unified_cgroup_hierarchy=1': > > # mount | grep cgroup2 > cgroup on /sys/fs/cgroup type cgroup2 > (rw,nosuid,nodev,noexec,relatime,nsdelegate) > # cd /sys/fs/cgroup > # ls > cgroup.controllers cgroup.procs cgroup.threads user.slice > cgroup.max.depth cgroup.stat init.scope > cgroup.max.descendants cgroup.subtree_control system.slice > # cat cgroup.controllers > cpu io memory pids rdma > # cat cgroup.subtree_control > cpu memory pids > # echo "+io" > cgroup.subtree_control > ... wait a few seconds ... > above panic is emitted on serial console > Thanks for providing the oops and the details! Do you mind testing rc1 to make sure this issue is resolved? Second, can you tell me a little more about your disk setup so I can more easily reproduce it? The oops above has the raid10 driver in the call stack. [1] https://lore.kernel.org/lkml/20180911184137.35897-1-dennisszhou@xxxxxxxxx/ Thanks, Dennis