Hi, This is what we got with 4.19.195 We catch the corruption It could be racing on the kernel The same CPU the same kernel thread try to add and delete And we have more cpu’s like this heavy load Jun 17 18:31:45 localhost kernel: [ 0.000000] Linux version 4.19.195-KM9 (root@bb1ab379213a) (gcc version 8.3.1 20190311 (Red Hat 8.3.1-3) (GCC)) #1 SMP Wed Jun 16 14:02:34 UTC 2021 Jun 17 18:31:45 localhost kernel: [ 0.000000] Command line: ro root=LABEL=/ rd_NO_LUKS KEYBOARDTYPE=pc KEYTABLE=us LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=512M,hig h nompath append="nmi_watchdog=2" printk.time=1 rd_NO_LVM rd_NO_DM dm_mod.use_blk_mq=y rcutree.kthread_prio=99 intel_pstate=enable intel_idle.max_cstate=0 processor.max_cstate=1 idle=halt con sole=tty0 console=ttyS0,38400n8d Jun 17 18:58:23 c-node06 kernel: [26921.734822] ? Jun 17 18:58:23 c-node06 kernel: [26921.734845] WARNING: CPU: 56 PID: 51893 at lib/list_debug.c:56 __list_del_entry_valid+0x8a/0x90 Jun 17 18:58:23 c-node06 kernel: [26921.734846] Modules linked in: iscsi_scst(OE) crc32c_intel scst_local(OE) netconsole scst_user(OE) scst(OE) drbd lru_cache be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libcxgb ib_iser(OE) iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi udf crc_itu_t 8021q mrp garp nfsd nfs_acl auth_rpcgss lockd sunrpc grace ipt_MASQUERADE xt_nat xt_state iptable_nat nf_nat_ipv4 xt_addrtype xt_conntrack nf_nat nf_conntrack nf_defrag_ipv4 nf_defrag_ipv6 libcrc32c br_netfilter bridge stp llc overlay dm_multipath rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) ib_uverbs(OE) mlx4_ib(OE) ib_core(OE) mlx4_core(OE) fuse binfmt_misc mlx5_core(OE) devlink mdev(OE) mlx_compat(OE) mlxfw(OE) pci_hyperv hv_balloon hv_utils Jun 17 18:58:23 c-node06 kernel: [26921.734884] ptp pps_core hv_netvsc pcspkr i2c_piix4 joydev sr_mod(E) cdrom(E) ext4(E) jbd2(E) mbcache(E) hv_storvsc(E) scsi_transport_fc(E) hid_hyperv(E) hyperv_keyboard(E) floppy(E) hyperv_fb(E) hv_vmbus(E) [last unloaded: scst_local] Jun 17 18:58:23 c-node06 kernel: [26921.734896] CPU: 56 PID: 51893 Comm: km_target_creat Kdump: loaded Tainted: G W OE 4.19.195-KM9 #1 Jun 17 18:58:23 c-node06 kernel: [26921.734897] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018 Jun 17 18:58:23 c-node06 kernel: [26921.734899] RIP: 0010:__list_del_entry_valid+0x8a/0x90 Jun 17 18:58:23 c-node06 kernel: [26921.734901] Code: 44 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 e0 0c 0e b9 e8 37 52 44 00 0f 0b 31 c0 c3 48 c7 c7 20 0d 0e b9 e8 26 52 44 00 <0f> 0b 31 c0 c3 90 48 85 d2 41 55 41 54 55 53 74 5f 48 85 f6 74 64 Jun 17 18:58:23 c-node06 kernel: [26921.734902] RSP: 0018:ffff8f8a2b2f7c08 EFLAGS: 00010086 Jun 17 18:58:23 c-node06 kernel: [26921.734903] RAX: 0000000000000000 RBX: ffff8f87dc9e2740 RCX: 0000000000000006 Jun 17 18:58:23 c-node06 kernel: [26921.734904] RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffff8f8adfc164f0 Jun 17 18:58:23 c-node06 kernel: [26921.734905] RBP: ffff8f8a2b3c8800 R08: 0000000000000064 R09: 0000000000000002 Jun 17 18:58:23 c-node06 kernel: [26921.734906] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 Jun 17 18:58:23 c-node06 kernel: [26921.734906] R13: ffff8f87dc9e2520 R14: 0000000000021e00 R15: ffff8f8adf6a1e00 Jun 17 18:58:23 c-node06 kernel: [26921.734908] FS: 00007fb35f8ec700(0000) GS:ffff8f8adfc00000(0000) knlGS:0000000000000000 Jun 17 18:58:23 c-node06 kernel: [26921.734908] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jun 17 18:58:23 c-node06 kernel: [26921.734909] CR2: ffffffffff600800 CR3: 000000201800e006 CR4: 00000000003606e0 Jun 17 18:58:23 c-node06 kernel: [26921.734912] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jun 17 18:58:23 c-node06 kernel: [26921.734912] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Jun 17 18:58:23 c-node06 kernel: [26921.734913] Call Trace: Jun 17 18:58:23 c-node06 kernel: [26921.734919] __delist_rt_entity+0x12/0x80 Jun 17 18:58:23 c-node06 kernel: [26921.734923] dequeue_rt_stack+0x75/0x280 Jun 17 18:58:23 c-node06 kernel: [26921.734924] dequeue_rt_entity+0x1f/0x70 Jun 17 18:58:23 c-node06 kernel: [26921.734925] dequeue_task_rt+0x26/0x70 Jun 17 18:58:23 c-node06 kernel: [26921.734926] push_rt_task+0x1e2/0x220 Jun 17 18:58:23 c-node06 kernel: [26921.734928] task_woken_rt+0x47/0x50 Jun 17 18:58:23 c-node06 kernel: [26921.734933] ttwu_do_wakeup+0x44/0x140 Jun 17 18:58:23 c-node06 kernel: [26921.734935] try_to_wake_up+0x1d2/0x460 un 17 18:58:23 c-node06 kernel: [26921.734940] ? sock_write_iter+0x97/0x100 Jun 17 18:58:23 c-node06 kernel: [26921.734942] wake_up_q+0x54/0x70 Jun 17 18:58:23 c-node06 kernel: [26921.734948] futex_wake+0x142/0x160 Jun 17 18:58:23 c-node06 kernel: [26921.734950] do_futex+0x2cc/0x9f0 Jun 17 18:58:23 c-node06 kernel: [26921.734955] ? vfs_writev+0xc5/0x100 Jun 17 18:58:23 c-node06 kernel: [26921.734958] ? __bad_area_nosemaphore+0x126/0x190 Jun 17 18:58:23 c-node06 kernel: [26921.734960] __x64_sys_futex+0x143/0x180 Jun 17 18:58:23 c-node06 kernel: [26921.734962] ? do_writev+0xe7/0x100 Jun 17 18:58:23 c-node06 kernel: [26921.734968] do_syscall_64+0x59/0x1b0 Jun 17 18:58:23 c-node06 kernel: [26921.734973] ? page_fault+0x8/0x30 Jun 17 18:58:23 c-node06 kernel: [26921.734975] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jun 17 18:58:23 c-node06 kernel: [26921.734977] RIP: 0033:0x7fc6114754c5 Jun 17 18:58:23 c-node06 kernel: [26921.734978] Code: 00 00 00 00 00 56 52 c7 07 00 00 00 00 81 f6 80 00 00 00 64 23 34 25 48 00 00 00 83 ce 01 ba 01 00 00 00 b8 ca 00 00 00 0f 05 <5a> 5e c3 0f 1f 84 00 00 00 00 00 41 54 41 55 49 89 fc 49 89 f5 48 Jun 17 18:58:23 c-node06 kernel: [26921.734979] RSP: 002b:00007fb35f8ea560 EFLAGS: 00000206 ORIG_RAX: 00000000000000ca Jun 17 18:58:23 c-node06 kernel: [26921.734980] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc6114754c5 Jun 17 18:58:23 c-node06 kernel: [26921.734981] RDX: 0000000000000001 RSI: 0000000000000081 RDI: 00007fbe040a0ec0 Jun 17 18:58:23 c-node06 kernel: [26921.734982] RBP: 00007fb35f8ea780 R08: 000000009a9c5a85 R09: 00007fb35f8ea4a8 : Jun 17 18:58:23 c-node06 kernel: [26921.734982] RBP: 00007fb35f8ea780 R08: 000000009a9c5a85 R09: 00007fb35f8ea4a8 Jun 17 18:58:23 c-node06 kernel: [26921.734982] R10: 0000000000000004 R11: 0000000000000206 R12: 00007fbe0409f398 Jun 17 18:58:23 c-node06 kernel: [26921.734983] R13: 0000000000000000 R14: 00000000000000fe R15: 0000000000000000 Jun 17 18:58:23 c-node06 kernel: [26921.734984] ---[ end trace d290eac16902b305 ]--- Jun 17 18:58:23 c-node06 kernel: [26921.734988] WARNING: CPU: 56 PID: 51893 at kernel/sched/rt.c:1250 __enqueue_rt_entity+0x313/0x370 Jun 17 18:58:23 c-node06 kernel: [26921.734989] Modules linked in: iscsi_scst(OE) crc32c_intel scst_local(OE) netconsole scst_user(OE) scst(OE) drbd lru_cache be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libcxgb ib_iser(OE) iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi udf crc_itu_t 8021q mrp garp nfsd nfs_acl auth_rpcgss lockd sunrpc grace ipt_MASQUERADE xt_nat xt_state iptable_nat nf_nat_ipv4 xt_addrtype xt_conntrack nf_nat nf_conntrack nf_defrag_ipv4 nf_defrag_ipv6 libcrc32c br_netfilter bridge stp llc overlay dm_multipath rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) ib_uverbs(OE) mlx4_ib(OE) ib_core(OE) mlx4_core(OE) fuse binfmt_misc mlx5_core(OE) devlink mdev(OE) mlx_compat(OE) mlxfw(OE) pci_hyperv hv_balloon hv_utils Jun 17 18:58:23 c-node06 kernel: [26921.735003] ptp pps_core hv_netvsc pcspkr i2c_piix4 joydev sr_mod(E) cdrom(E) ext4(E) jbd2(E) mbcache(E) hv_storvsc(E) scsi_transport_fc(E) hid_hyperv(E) hyperv_keyboard(E) floppy(E) hyperv_fb(E) hv_vmbus(E) [last unloaded: scst_local] Jun 17 18:58:23 c-node06 kernel: [26921.735009] CPU: 56 PID: 51893 Comm: km_target_creat Kdump: loaded Tainted: G W OE 4.19.195-KM9 #1 Jun 17 18:58:23 c-node06 kernel: [26921.735010] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018 Jun 17 18:58:23 c-node06 kernel: [26921.735011] RIP: 0010:__enqueue_rt_entity+0x313/0x370 Jun 17 18:58:23 c-node06 kernel: [26921.735011] Code: ff ff e9 11 ff ff ff 48 83 c4 08 48 89 ee 48 89 df 5b 5d 41 5c 41 5d e9 fb f9 ff ff ba 01 00 00 00 66 89 53 24 e9 cf fd ff ff <0f> 0b e9 6d fd ff ff 48 8b 83 a0 01 00 00 48 8d ab 70 01 00 00 c7 Jun 17 18:58:23 c-node06 kernel: [26921.735012] RSP: 0018:ffff8f8a2b2f7c20 EFLAGS: 00010002 Jun 17 18:58:23 c-node06 kernel: [26921.735013] RAX: ffff8f8a2b3c8800 RBX: ffff8f8a2d2cb540 RCX: ffff8f8adf9a2050 Jun 17 18:58:23 c-node06 kernel: [26921.735014] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8f8a2d2cb540 Jun 17 18:58:23 c-node06 kernel: [26921.735014] RBP: ffff8f8adf9a2040 R08: 0000000000000003 R09: 0000000000000000 Jun 17 18:58:23 c-node06 kernel: [26921.735015] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8f8adf9a2670 Jun 17 18:58:23 c-node06 kernel: [26921.735016] R13: ffff8f87dc9e2520 R14: 0000000000021e00 R15: ffff8f8adf6a1e00 Jun 17 18:58:23 c-node06 kernel: [26921.735016] FS: 00007fb35f8ec700(0000) GS:ffff8f8adfc00000(0000) knlGS:0000000000000000 Jun 17 18:58:23 c-node06 kernel: [26921.735017] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jun 17 18:58:23 c-node06 kernel: [26921.735018] CR2: ffffffffff600800 CR3: 000000201800e006 CR4: 00000000003606e0 Jun 17 18:58:23 c-node06 kernel: [26921.735018] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jun 17 18:58:23 c-node06 kernel: [26921.735019] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Jun 17 18:58:23 c-node06 kernel: [26921.735020] Call Trace: Jun 17 18:58:23 c-node06 kernel: [26921.735021] ? dequeue_rt_stack+0x1ed/0x280 Jun 17 18:58:23 c-node06 kernel: [26921.735022] dequeue_rt_entity+0x4d/0x70 Jun 17 18:58:23 c-node06 kernel: [26921.735023] dequeue_task_rt+0x26/0x70 Jun 17 18:58:23 c-node06 kernel: [26921.735025] push_rt_task+0x1e2/0x220 Jun 17 18:58:23 c-node06 kernel: [26921.735026] task_woken_rt+0x47/0x50 Jun 17 18:58:23 c-node06 kernel: [26921.735028] ttwu_do_wakeup+0x44/0x140 Jun 17 18:58:23 c-node06 kernel: [26921.735030] try_to_wake_up+0x1d2/0x460 Jun 17 18:58:23 c-node06 kernel: [26921.735031] ? sock_write_iter+0x97/0x100 Jun 17 18:58:23 c-node06 kernel: [26921.735032] wake_up_q+0x54/0x70 Jun 17 18:58:23 c-node06 kernel: [26921.735034] futex_wake+0x142/0x160 Jun 17 18:58:23 c-node06 kernel: [26921.735036] do_futex+0x2cc/0x9f0 Jun 17 18:58:23 c-node06 kernel: [26921.735037] ? vfs_writev+0xc5/0x100 Jun 17 18:58:23 c-node06 kernel: [26921.735039] ? __bad_area_nosemaphore+0x126/0x190 Jun 17 18:58:23 c-node06 kernel: [26921.735040] __x64_sys_futex+0x143/0x180 Jun 17 18:58:23 c-node06 kernel: [26921.735042] ? do_writev+0xe7/0x100 Jun 17 18:58:23 c-node06 kernel: [26921.735043] do_syscall_64+0x59/0x1b0 Jun 17 18:58:23 c-node06 kernel: [26921.735045] ? page_fault+0x8/0x30 Jun 17 18:58:23 c-node06 kernel: [26921.735046] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jun 17 18:58:23 c-node06 kernel: [26921.735047] RIP: 0033:0x7fc6114754c5 Jun 17 18:58:23 c-node06 kernel: [26921.735047] Code: 00 00 00 00 00 56 52 c7 07 00 00 00 00 81 f6 80 00 00 00 64 23 34 25 48 00 00 00 83 ce 01 ba 01 00 00 00 b8 ca 00 00 00 0f 05 <5a> 5e c3 0f 1f 84 00 00 00 00 00 41 54 41 55 49 89 fc 49 89 f5 48 Jun 17 18:58:23 c-node06 kernel: [26921.735048] RSP: 002b:00007fb35f8ea560 EFLAGS: 00000206 ORIG_RAX: 00000000000000ca Jun 17 18:58:23 c-node06 kernel: [26921.735049] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc6114754c5 Jun 17 18:58:23 c-node06 kernel: [26921.735050] RDX: 0000000000000001 RSI: 0000000000000081 RDI: 00007fbe040a0ec0 Jun 17 18:58:23 c-node06 kernel: [26921.735051] RBP: 00007fb35f8ea780 R08: 000000009a9c5a85 R09: 00007fb35f8ea4a8 Jun 17 18:58:23 c-node06 kernel: [26921.735051] R10: 0000000000000004 R11: 0000000000000206 R12: 00007fbe0409f398 Jun 17 18:58:23 c-node06 kernel: [26921.735052] R13: 0000000000000000 R14: 00000000000000fe R15: 0000000000000000 Jun 17 18:58:23 c-node06 kernel: [26921.735053] ---[ end trace d290eac16902b306 ]--- Jun 17 18:58:23 c-node06 kernel: [26921.735054] ------------[ cut here ]------------ Jun 17 18:58:23 c-node06 kernel: [26921.735055] list_add double add: new=ffff8f8a2d2cb540, prev=ffff8f8a2d2cb540, next=ffff8f8adf9a2670. Jun 17 18:58:23 c-node06 kernel: [26921.735066] WARNING: CPU: 56 PID: 51893 at lib/list_debug.c:31 __list_add_valid+0x67/0x70 Jun 17 18:58:23 c-node06 kernel: [26921.735066] Modules linked in: iscsi_scst(OE) crc32c_intel scst_local(OE) netconsole scst_user(OE) scst(OE) drbd lru_cache be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libcxgb ib_iser(OE) iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi udf crc_itu_t 8021q mrp garp nfsd nfs_acl auth_rpcgss lockd sunrpc gr ace ipt_MASQUERADE xt_nat xt_state iptable_nat nf_nat_ipv4 xt_addrtype xt_conntrack nf_nat nf_conntrack nf_defrag_ipv4 nf_defrag_ipv6 libcrc32c br_netfilter bridge stp llc overlay dm_multipat h rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) ib_uverbs(OE) mlx4_ib(OE) ib_core(OE) mlx4_core(OE) fuse binfmt_misc mlx5_co re(OE) devlink mdev(OE) mlx_compat(OE) mlxfw(OE) pci_hyperv hv_balloon hv_utils Jun 17 18:58:23 c-node06 kernel: [26921.735077] ptp pps_core hv_netvsc pcspkr i2c_piix4 joydev sr_mod(E) cdrom(E) ext4(E) jbd2(E) mbcache(E) hv_storvsc(E) scsi_transport_fc(E) hid_hyperv(E) hyperv_keyboard(E) floppy(E) hyperv_fb(E) hv_vmbus(E) [last unloaded: scst_local] Jun 17 18:58:23 c-node06 kernel: [26921.735080] CPU: 56 PID: 51893 Comm: km_target_creat Kdump: loaded Tainted: G W OE 4.19.195-KM9 #1 Jun 17 18:58:23 c-node06 kernel: [26921.735080] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018 Jun 17 18:58:23 c-node06 kernel: [26921.735082] RIP: 0010:__list_add_valid+0x67/0x70 Jun 17 18:58:23 c-node06 kernel: [26921.735082] Code: c1 4c 89 c6 48 c7 c7 e8 0b 0e b9 e8 d3 52 44 00 0f 0b 31 c0 c3 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 38 0c 0e b9 e8 b9 52 44 00 <0f> 0b 31 c0 c3 0f 1f 40 00 48 b9 00 01 00 00 00 00 ad de 48 8b 07 Jun 17 18:58:23 c-node06 kernel: [26921.735083] RSP: 0018:ffff8f8a2b2f7c18 EFLAGS: 00010086 Jun 17 18:58:23 c-node06 kernel: [26921.735083] RAX: 0000000000000000 RBX: ffff8f8a2d2cb540 RCX: 0000000000000006 Jun 17 18:58:23 c-node06 kernel: [26921.735084] RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffff8f8adfc164f0 Jun 17 18:58:23 c-node06 kernel: [26921.735084] RBP: ffff8f8adf9a2040 R08: 0000000000000068 R09: 0000000000000002 Jun 17 18:58:23 c-node06 kernel: [26921.735084] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8f8adf9a2670 Jun 17 18:58:23 c-node06 kernel: [26921.735085] R13: ffff8f8a2d2cb540 R14: 0000000000021e00 R15: ffff8f8adf6a1e00 Jun 17 18:58:23 c-node06 kernel: [26921.735085] FS: 00007fb35f8ec700(0000) GS:ffff8f8adfc00000(0000) knlGS:0000000000000000 Jun 17 18:58:23 c-node06 kernel: [26921.735086] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jun 17 18:58:23 c-node06 kernel: [26921.735086] CR2: ffffffffff600800 CR3: 000000201800e006 CR4: 00000000003606e0 Jun 17 18:58:23 c-node06 kernel: [26921.735087] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jun 17 18:58:23 c-node06 kernel: [26921.735087] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Jun 17 18:58:23 c-node06 kernel: [26921.735088] Call Trace: Jun 17 18:58:23 c-node06 kernel: [26921.735088] __enqueue_rt_entity+0x227/0x370 Jun 17 18:58:23 c-node06 kernel: [26921.735089] ? dequeue_rt_stack+0x1ed/0x280 Jun 17 18:58:23 c-node06 kernel: [26921.735090] dequeue_rt_entity+0x4d/0x70 Jun 17 18:58:23 c-node06 kernel: [26921.735091] dequeue_task_rt+0x26/0x70 Jun 17 18:58:23 c-node06 kernel: [26921.735091] push_rt_task+0x1e2/0x220 Jun 17 18:58:23 c-node06 kernel: [26921.735092] task_woken_rt+0x47/0x50 Jun 17 18:58:23 c-node06 kernel: [26921.735093] ttwu_do_wakeup+0x44/0x140 : Jun 17 18:58:23 c-node06 kernel: [26921.735095] try_to_wake_up+0x1d2/0x460 Jun 17 18:58:23 c-node06 kernel: [26921.735096] ? sock_write_iter+0x97/0x100 Jun 17 18:58:23 c-node06 kernel: [26921.735098] wake_up_q+0x54/0x70 Jun 17 18:58:23 c-node06 kernel: [26921.735099] futex_wake+0x142/0x160 Jun 17 18:58:23 c-node06 kernel: [26921.735101] do_futex+0x2cc/0x9f0 Jun 17 18:58:23 c-node06 kernel: [26921.735102] ? vfs_writev+0xc5/0x100 Jun 17 18:58:23 c-node06 kernel: [26921.735104] ? __bad_area_nosemaphore+0x126/0x190 Jun 17 18:58:23 c-node06 kernel: [26921.735105] __x64_sys_futex+0x143/0x180 Jun 17 18:58:23 c-node06 kernel: [26921.735106] ? do_writev+0xe7/0x100 Jun 17 18:58:23 c-node06 kernel: [26921.735108] do_syscall_64+0x59/0x1b0 Jun 17 18:58:23 c-node06 kernel: [26921.735109] ? page_fault+0x8/0x30 Jun 17 18:58:23 c-node06 kernel: [26921.735110] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jun 17 18:58:23 c-node06 kernel: [26921.735111] RIP: 0033:0x7fc6114754c5 Jun 17 18:58:23 c-node06 kernel: [26921.735112] Code: 00 00 00 00 00 56 52 c7 07 00 00 00 00 81 f6 80 00 00 00 64 23 34 25 48 00 00 00 83 ce 01 ba 01 00 00 00 b8 ca 00 00 00 0f 05 <5a> 5e c3 0f 1f 84 00 00 00 00 00 41 54 41 55 49 89 fc 49 89 f5 48 Jun 17 18:58:23 c-node06 kernel: [26921.735112] RSP: 002b:00007fb35f8ea560 EFLAGS: 00000206 ORIG_RAX: 00000000000000ca Jun 17 18:58:23 c-node06 kernel: [26921.735113] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc6114754c5 Jun 17 18:58:23 c-node06 kernel: [26921.735114] RDX: 0000000000000001 RSI: 0000000000000081 RDI: 00007fbe040a0ec0 Jun 17 18:58:23 c-node06 kernel: [26921.735114] RBP: 00007fb35f8ea780 R08: 000000009a9c5a85 R09: 00007fb35f8ea4a8 Jun 17 18:58:23 c-node06 kernel: [26921.735115] R10: 0000000000000004 R11: 0000000000000206 R12: 00007fbe0409f398 Jun 17 18:58:23 c-node06 kernel: [26921.735115] R13: 0000000000000000 R14: 00000000000000fe R15: 0000000000000000 Jun 17 18:58:23 c-node06 kernel: [26921.735117] ---[ end trace d290eac16902b307 ]--- Jun 17 18:58:23 c-node06 kernel: [26921.735124] ------------[ cut here ]------------ Jun 17 18:58:23 c-node06 kernel: [26921.735126] list_del corruption. prev->next should be ffff8f7e2649a740, but was ffff8f8a3347e630 Jun 17 18:58:23 c-node06 kernel: [26921.735136] WARNING: CPU: 46 PID: 53761 at lib/list_debug.c:53 __list_del_entry_valid+0x79/0x90 Jun 17 18:58:23 c-node06 kernel: [26921.735137] Modules linked in: iscsi_scst(OE) crc32c_intel scst_local(OE) netconsole scst_user(OE) scst(OE) drbd lru_cache be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libcxgb ib_iser(OE) iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi udf crc_itu_t 8021q mrp garp nfsd nfs_acl auth_rpcgss lockd sunrpc grace ipt_MASQUERADE xt_nat xt_state iptable_nat nf_nat_ipv4 xt_addrtype xt_conntrack nf_nat nf_conntrack nf_defrag_ipv4 nf_defrag_ipv6 libcrc32c br_netfilter bridge stp llc overlay dm_multipath rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) ib_uverbs(OE) mlx4_ib(OE) ib_core(OE) mlx4_core(OE) fuse binfmt_misc mlx5_core(OE) devlink mdev(OE) mlx_compat(OE) mlxfw(OE) pci_hyperv hv_balloon hv_utils Jun 17 18:58:23 c-node06 kernel: [26921.735154] ptp pps_core hv_netvsc pcspkr i2c_piix4 joydev sr_mod(E) cdrom(E) ext4(E) jbd2(E) mbcache(E) hv_storvsc(E) scsi_transport_fc(E) hid_hyperv(E) hyperv_keyboard(E) floppy(E) hyperv_fb(E) hv_vmbus(E) [last unloaded: scst_local] Jun 17 18:58:23 c-node06 kernel: [26921.735160] CPU: 46 PID: 53761 Comm: STAR4BLKS1_WORK Kdump: loaded Tainted: G W OE 4.19.195-KM9 #1 Jun 17 18:58:23 c-node06 kernel: [26921.735160] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018 Jun 17 18:58:23 c-node06 kernel: [26921.735162] RIP: 0010:__list_del_entry_valid+0x79/0x90 Jun 17 18:58:23 c-node06 kernel: [26921.735163] Code: 0b 31 c0 c3 48 89 fe 48 c7 c7 a8 0c 0e b9 e8 4e 52 44 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 e0 0c 0e b9 e8 37 52 44 00 <0f> 0b 31 c0 c3 48 c7 c7 20 0d 0e b9 e8 26 52 44 00 0f 0b 31 c0 c3 Jun 17 18:58:23 c-node06 kernel: [26921.735164] RSP: 0018:ffff8f513372fb40 EFLAGS: 00010086 Jun 17 18:58:23 c-node06 kernel: [26921.735165] RAX: 0000000000000000 RBX: ffff8f7e2649a740 RCX: 0000000000000006 Jun 17 18:58:23 c-node06 kernel: [26921.735165] RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff8f8adf9964f0 Jun 17 18:58:23 c-node06 kernel: [26921.735166] RBP: ffff8f8a2b3c8800 R08: 0000000000000064 R09: 0000000000000002 Jun 17 18:58:23 c-node06 kernel: [26921.735166] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 Jun 17 18:58:23 c-node06 kernel: [26921.735167] R13: 0000000000000062 R14: 0000000000021e00 R15: ffff8f8adf9e1e00 Jun 17 18:58:23 c-node06 kernel: [26921.735168] FS: 00007fa03dac4700(0000) GS:ffff8f8adf980000(0000) knlGS:0000000000000000 Jun 17 18:58:23 c-node06 kernel: [26921.735168] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jun 17 18:58:23 c-node06 kernel: [26921.735169] CR2: ffffffffff600400 CR3: 000000201800e006 CR4: 00000000003606e0 Jun 17 18:58:23 c-node06 kernel: [26921.735171] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jun 17 18:58:23 c-node06 kernel: [26921.735171] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Jun 17 18:58:23 c-node06 kernel: [26921.735172] Call Trace: Jun 17 18:58:23 c-node06 kernel: [26921.735175] __delist_rt_entity+0x12/0x80 Jun 17 18:58:23 c-node06 kernel: [26921.735176] dequeue_rt_stack+0x75/0x280 Jun 17 18:58:23 c-node06 kernel: [26921.735177] dequeue_rt_entity+0x1f/0x70 Jun 17 18:58:23 c-node06 kernel: [26921.735178] dequeue_task_rt+0x26/0x70 Jun 17 18:58:23 c-node06 kernel: [26921.735179] push_rt_task+0x1e2/0x220 Jun 17 18:58:23 c-node06 kernel: [26921.735180] push_rt_tasks+0x11/0x20 Jun 17 18:58:23 c-node06 kernel: [26921.735182] __balance_callback+0x3b/0x60 Jun 17 18:58:23 c-node06 kernel: [26921.735187] __schedule+0x6e9/0x830 Jun 17 18:58:23 c-node06 kernel: [26921.735190] schedule+0x28/0x80 Jun 17 18:58:23 c-node06 kernel: [26921.735193] futex_wait_queue_me+0xb9/0x120 Jun 17 18:58:23 c-node06 kernel: [26921.735194] futex_wait+0x139/0x250 Jun 17 18:58:23 c-node06 kernel: [26921.735196] ? try_to_wake_up+0x54/0x460 Jun 17 18:58:23 c-node06 kernel: [26921.735197] ? enqueue_task_rt+0x9f/0xc0 Jun 17 18:58:23 c-node06 kernel: [26921.735199] do_futex+0x2eb/0x9f0 Jun 17 18:58:23 c-node06 kernel: [26921.735204] ? plist_add+0xc1/0xf0 Jun 17 18:58:23 c-node06 kernel: [26921.735205] ? plist_add+0xc1/0xf0 Jun 17 18:58:23 c-node06 kernel: [26921.735206] ? plist_del+0x5f/0xb0 Jun 17 18:58:23 c-node06 kernel: [26921.735210] ? __switch_to+0x115/0x420 Jun 17 18:58:23 c-node06 kernel: [26921.735211] __x64_sys_futex+0x143/0x180 Jun 17 18:58:23 c-node06 kernel: [26921.735216] do_syscall_64+0x59/0x1b0 Jun 17 18:58:23 c-node06 kernel: [26921.735217] ? prepare_exit_to_usermode+0x70/0x90 Jun 17 18:58:23 c-node06 kernel: [26921.735219] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jun 17 18:58:23 c-node06 kernel: [26921.735220] RIP: 0033:0x7fc611475334 Jun 17 18:58:23 c-node06 kernel: [26921.735221] Code: 66 0f 1f 44 00 00 41 52 52 4d 31 d2 ba 02 00 00 00 81 f6 80 00 00 00 64 23 34 25 48 00 00 00 39 d0 75 07 b8 ca 00 00 00 0f 05 <89> d0 87 07 85 c0 75 f1 5a 41 5a c3 83 3d f1 df 20 00 00 74 59 48 Jun 17 18:58:23 c-node06 kernel: [26921.735221] RSP: 002b:00007fa03dac2f60 EFLAGS: 00000202 ORIG_RAX: 00000000000000ca Jun 17 18:58:23 c-node06 kernel: [26921.735222] RAX: ffffffffffffffda RBX: 00007fa1af1e9768 RCX: 00007fc611475334 Jun 17 18:58:23 c-node06 kernel: [26921.735223] RDX: 0000000000000002 RSI: 0000000000000080 RDI: 00007fa1af1e97f8 Jun 17 18:58:23 c-node06 kernel: [26921.735223] RBP: 00007fa03dac2f80 R08: 00007fa1af1e97f8 R09: 000000000000d201 Jun 17 18:58:23 c-node06 kernel: [26921.735224] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 Jun 17 18:58:23 c-node06 kernel: [26921.735224] R13: 00007fa1af1e97a8 R14: 0000000000000001 R15: 0000000000b477c4 Jun 17 18:58:23 c-node06 kernel: [26921.735225] ---[ end trace d290eac16902b308 ]--- Jun 17 18:58:23 c-node06 kernel: [26921.735232] list_add corruption. prev->next should be next (ffff8f8a2b3c8e30), but was ffff8f6a1fd14c40. (prev=ffff8f7e2649a740). Jun 17 18:58:23 c-node06 kernel: [26921.735240] WARNING: CPU: 46 PID: 53761 at lib/list_debug.c:28 __list_add_valid+0x4d/0x70 Jun 17 18:58:23 c-node06 kernel: [26921.735240] Modules linked in: iscsi_scst(OE) crc32c_intel scst_local(OE) netconsole scst_user(OE) scst(OE) drbd lru_cache be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libcxgb ib_iser(OE) iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi udf crc_itu_t 8021q mrp garp nfsd nfs_acl auth_rpcgss lockd sunrpc grace ipt_MASQUERADE xt_nat xt_state iptable_nat nf_nat_ipv4 xt_addrtype xt_conntrack nf_nat nf_conntrack nf_defrag_ipv4 nf_defrag_ipv6 libcrc32c br_netfilter bridge stp llc overlay dm_multipath rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) ib_uverbs(OE) mlx4_ib(OE) ib_core(OE) mlx4_core(OE) fuse binfmt_misc mlx5_core(OE) devlink mdev(OE) mlx_compat(OE) mlxfw(OE) pci_hyperv hv_balloon hv_utils Jun 17 18:58:23 c-node06 kernel: [26921.735256] ptp pps_core hv_netvsc pcspkr i2c_piix4 joydev sr_mod(E) cdrom(E) ext4(E) jbd2(E) mbcache(E) hv_storvsc(E) scsi_transport_fc(E) hid_hyperv(E) hyperv_keyboard(E) floppy(E) hyperv_fb(E) hv_vmbus(E) [last unloaded: scst_local] Jun 17 18:58:23 c-node06 kernel: [26921.735261] CPU: 46 PID: 53761 Comm: STAR4BLKS1_WORK Kdump: loaded Tainted: G W OE 4.19.195-KM9 #1 Jun 17 18:58:23 c-node06 kernel: [26921.735261] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018 Jun 17 18:58:23 c-node06 kernel: [26921.735263] RIP: 0010:__list_add_valid+0x4d/0x70 Jun 17 18:58:23 c-node06 kernel: [26921.735264] Code: c3 48 89 d1 48 c7 c7 98 0b 0e b9 48 89 c2 e8 ea 52 44 00 0f 0b 31 c0 c3 48 89 c1 4c 89 c6 48 c7 c7 e8 0b 0e b9 e8 d3 52 44 00 <0f> 0b 31 c0 c3 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 38 0c 0e b9 e8 Jun 17 18:58:23 c-node06 kernel: [26921.735265] RSP: 0018:ffff8f8adf983f10 EFLAGS: 00010082 Jun 17 18:58:23 c-node06 kernel: [26921.735266] RAX: 0000000000000000 RBX: ffff8f7e2658a740 RCX: 0000000000000006 Jun 17 18:58:23 c-node06 kernel: [26921.735266] RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff8f8adf9964f0 Jun 17 18:58:23 c-node06 kernel: [26921.735267] RBP: ffff8f8a2b3c8800 R08: 0000000000000088 R09: 0000000000000002 Jun 17 18:58:23 c-node06 kernel: [26921.735268] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8f8a2b3c8e30 Jun 17 18:58:23 c-node06 kernel: [26921.735268] R13: ffff8f7e2649a740 R14: 0000000000000000 R15: 0000000000000000 Jun 17 18:58:23 c-node06 kernel: [26921.735269] FS: 00007fa03dac4700(0000) GS:ffff8f8adf980000(0000) knlGS:0000000000000000 Jun 17 18:58:23 c-node06 kernel: [26921.735270] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jun 17 18:58:23 c-node06 kernel: [26921.735271] CR2: ffffffffff600400 CR3: 000000201800e006 CR4: 00000000003606e0 : un 17 18:58:23 c-node06 kernel: [26921.735272] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jun 17 18:58:23 c-node06 kernel: [26921.735273] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Jun 17 18:58:23 c-node06 kernel: [26921.735273] Call Trace: Jun 17 18:58:23 c-node06 kernel: [26921.735274] <IRQ> Jun 17 18:58:23 c-node06 kernel: [26921.735275] __enqueue_rt_entity+0x227/0x370 Jun 17 18:58:23 c-node06 kernel: [26921.735276] ? dequeue_rt_stack+0x1b4/0x280 Jun 17 18:58:23 c-node06 kernel: [26921.735277] enqueue_rt_entity+0x2d/0x50 Jun 17 18:58:23 c-node06 kernel: [26921.735278] enqueue_task_rt+0x2f/0xc0 Jun 17 18:58:23 c-node06 kernel: [26921.735280] ttwu_do_activate+0x44/0x80 Jun 17 18:58:23 c-node06 kernel: [26921.735283] sched_ttwu_pending+0x87/0xd0 Jun 17 18:58:23 c-node06 kernel: [26921.735285] scheduler_ipi+0xa4/0x120 Jun 17 18:58:23 c-node06 kernel: [26921.735287] reschedule_interrupt+0xf/0x20 Jun 17 18:58:23 c-node06 kernel: [26921.735288] </IRQ> Jun 17 18:58:23 c-node06 kernel: [26921.735290] RIP: 0010:_raw_spin_unlock_irqrestore+0xd/0x20 Jun 17 18:58:23 c-node06 kernel: [26921.735291] Code: 87 ff 48 29 d8 48 3d 24 f4 00 00 76 cc 80 4d 00 08 eb 98 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 c6 07 00 48 89 f7 57 9d <0f> 1f 44 00 00 c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 Jun 17 18:58:23 c-node06 kernel: [26921.735292] RSP: 0018:ffff8f513372fc30 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff02 Jun 17 18:58:23 c-node06 kernel: [26921.735293] RAX: 0000000000000000 RBX: ffff8f8adfae1e00 RCX: 0000000000000000 Jun 17 18:58:23 c-node06 kernel: [26921.735294] RDX: ffff8f8adf9a26b8 RSI: 0000000000000286 RDI: 0000000000000286 Jun 17 18:58:23 c-node06 kernel: [26921.735295] RBP: ffff8f513372fc88 R08: 0000800000000000 R09: ffff8f6a5f00e0d8 Jun 17 18:58:23 c-node06 kernel: [26921.735295] R10: 00003f8cd33bbd2e R11: 0000000000000001 R12: ffff8f51007d0000 Jun 17 18:58:23 c-node06 kernel: [26921.735296] R13: ffff8f50fc504a00 R14: ffff8f69fcc31540 R15: ffff8f8adf9a1e00 Jun 17 18:58:23 c-node06 kernel: [26921.735299] __schedule+0x6e9/0x830 Jun 17 18:58:23 c-node06 kernel: [26921.735301] schedule+0x28/0x80 Jun 17 18:58:23 c-node06 kernel: [26921.735303] futex_wait_queue_me+0xb9/0x120 Jun 17 18:58:23 c-node06 kernel: [26921.735304] futex_wait+0x139/0x250 Jun 17 18:58:23 c-node06 kernel: [26921.735306] ? try_to_wake_up+0x54/0x460 Jun 17 18:58:23 c-node06 kernel: [26921.735307] ? enqueue_task_rt+0x9f/0xc0 Jun 17 18:58:23 c-node06 kernel: [26921.735309] do_futex+0x2eb/0x9f0 Jun 17 18:58:23 c-node06 kernel: [26921.735311] ? plist_add+0xc1/0xf0 -----Original Message----- From: David Mozes <david.mozes@xxxxxxx> Sent: Wednesday, June 16, 2021 6:42 PM To: Thomas Gleixner <tglx@xxxxxxxxxxxxx>; Matthew Wilcox <willy@xxxxxxxxxxxxx> Cc: linux-fsdevel@xxxxxxxxxxxxxxx; Ingo Molnar <mingo@xxxxxxxxxx>; Peter Zijlstra <peterz@xxxxxxxxxxxxx>; Darren Hart <dvhart@xxxxxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx Subject: RE: futex/call -to plist_for_each_entry_safe with head=NULL I Will try with the latest 4.19.195 and will see. Thx David -----Original Message----- From: Thomas Gleixner <tglx@xxxxxxxxxxxxx> Sent: Tuesday, June 15, 2021 6:04 PM To: Matthew Wilcox <willy@xxxxxxxxxxxxx>; David Mozes <david.mozes@xxxxxxx> Cc: linux-fsdevel@xxxxxxxxxxxxxxx; Ingo Molnar <mingo@xxxxxxxxxx>; Peter Zijlstra <peterz@xxxxxxxxxxxxx>; Darren Hart <dvhart@xxxxxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx Subject: Re: futex/call -to plist_for_each_entry_safe with head=NULL On Sun, Jun 13 2021 at 21:04, Matthew Wilcox wrote: > On Sun, Jun 13, 2021 at 12:24:52PM +0000, David Mozes wrote: >> Hi *, >> Under a very high load of io traffic, we got the below BUG trace. >> We can see that: >> plist_for_each_entry_safe(this, next, &hb1->chain, list) { >> if (match_futex (&this->key, &key1)) >> >> were called with hb1 = NULL at futex_wake_up function. >> And there is no protection on the code regarding such a scenario. >> >> The NULL can be geting from: >> hb1 = hash_futex(&key1); Definitely not. >> >> How can we protect against such a situation? > > Can you reproduce it without loading proprietary modules? > > Your analysis doesn't quite make sense: > > hb1 = hash_futex(&key1); > hb2 = hash_futex(&key2); > > retry_private: > double_lock_hb(hb1, hb2); > > If hb1 were NULL, then the oops would come earlier, in double_lock_hb(). Sure, but hash_futex() _cannot_ return a NULL pointer ever. >> >> >> This happened in kernel 4.19.149 running on Azure vm 4.19.149 is almost 50 versions behind the latest 4.19.194 stable. The other question is whether this happens with an less dead kernel as well. Thanks, tglx