On Sun, 2015-02-22 at 18:20 +0200, Sagi Grimberg wrote: > On 2/19/2015 9:23 PM, Chris Moore wrote: > > We're running RHEL 7.1 kernel plus the patches that Sagi helped port for me. On the target we're getting the stack trace below (list_del corruption). > > > > Leading up to the stack trace we get a bunch of: > > Feb 10 16:47:28 target-1 kernel: ib_post_send() failed for IB_WR_RDMA_READ > > Feb 10 16:47:28 target-1 kernel: ib_post_send failed with -12 > > > > Hey Chris, > > It seems that you ran out of SQ space. Does the rhel7.1 code include > the completion coalescing code? > > Can you tell us the scenario you ran to invoke this bug? > > The trace below indicate some form of use-after-free where we try to > delete cmd->i_conn_node. > > > Followed by a bunch of aborts: > > Feb 10 16:48:30 target-1 kernel: ABORT_TASK: Found referenced iSCSI task_tag: 26 > > Feb 10 16:48:30 target-1 kernel: ABORT_TASK: ref_tag: 26 already complete, skipping > > Feb 10 16:48:30 target-1 kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 26 > > > > I'm working on tracking down the cause of the ib_post_send failure. Has anyone seen the list_del corruption issue? > > I am also seeing these kind of corruptions on rhel7.0 when working > against multiple initiators (~60 sessions) doing stress logins/logouts. > It doesn't look good... > > > > > > > Feb 10 16:50:19 target-1 kernel: ------------[ cut here ]------------ > > Feb 10 16:50:19 target-1 kernel: WARNING: at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0() > > Feb 10 16:50:19 target-1 kernel: list_del corruption. prev->next should be ffff8800c10b5040, but was ffff8802f35fb640 > > Feb 10 16:50:19 target-1 kernel: Modules linked in: tcp_lp target_core_pscsi target_core_file target_core_iblock 8021q garp mrp bnep bluetooth rfkill fuse xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT iptable_filter ip_tables tun bridge stp llc xprtrdma sunrpc ib_isert(F) iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad iTCO_wdt iTCO_vendor_support ocrdma ib_core ib_addr lpc_ich serio_raw shpchp wmi mfd_core dm_mirror dm_region_hash dm_log dm_mod intel_powerclamp ipmi_devintf coretemp pcspkr kvm_intel kvm dcdbas crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel > > Feb 10 16:50:19 target-1 kernel: aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ipmi_si ipmi_msghandler acpi_power_meter uinput i7core_edac edac_core ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_common sr_mod cdrom ata_generic pata_acpi mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ttm drm mptsas ata_piix be2net scsi_transport_sas libata i2c_core mptscsih vxlan mptbase ip_tunnel bnx2 > > Feb 10 16:50:19 target-1 kernel: CPU: 12 PID: 25240 Comm: kworker/12:2 Tainted: GF I -------------- 3.10.0-224.el7.x86_64 #1 > > Feb 10 16:50:19 target-1 kernel: Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS 6.3.0 07/24/2012 > > Feb 10 16:50:19 target-1 kernel: Workqueue: isert_comp_wq isert_do_control_comp [ib_isert] > > Feb 10 16:50:19 target-1 kernel: ffff8805f8f57d38 00000000f4d510fb ffff8805f8f57cf0 ffffffff81603f46 > > Feb 10 16:50:19 target-1 kernel: ffff8805f8f57d28 ffffffff8106e28b ffff8800c10b4e00 ffff8800c10b5040 > > Feb 10 16:50:19 target-1 kernel: 0000000000000000 ffff8800c10b4e40 ffff88031e04aee8 ffff8805f8f57d90 > > Feb 10 16:50:19 target-1 kernel: Call Trace: > > Feb 10 16:50:19 target-1 kernel: [<ffffffff81603f46>] dump_stack+0x19/0x1b > > Feb 10 16:50:19 target-1 kernel: [<ffffffff8106e28b>] warn_slowpath_common+0x6b/0xb0 > > Feb 10 16:50:19 target-1 kernel: [<ffffffff8106e32c>] warn_slowpath_fmt+0x5c/0x80 > > Feb 10 16:50:19 target-1 kernel: [<ffffffff812ed571>] __list_del_entry+0xa1/0xd0 > > Feb 10 16:50:19 target-1 kernel: [<ffffffffa05a757f>] isert_completion_put+0x26f/0x3f0 [ib_isert] > > Feb 10 16:50:19 target-1 kernel: [<ffffffffa05a77cd>] isert_do_control_comp+0xcd/0x1b0 [ib_isert] > > Feb 10 16:50:19 target-1 kernel: [<ffffffff8108f0ab>] process_one_work+0x17b/0x470 > > Feb 10 16:50:19 target-1 kernel: [<ffffffff8108fe8b>] worker_thread+0x11b/0x400 > > Feb 10 16:50:19 target-1 kernel: [<ffffffff8108fd70>] ? rescuer_thread+0x400/0x400 > > Feb 10 16:50:19 target-1 kernel: [<ffffffff8109726f>] kthread+0xcf/0xe0 > > Feb 10 16:50:19 target-1 kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140 > > Feb 10 16:50:19 target-1 kernel: [<ffffffff81613d3c>] ret_from_fork+0x7c/0xb0 > > Feb 10 16:50:19 target-1 kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140 > > Feb 10 16:50:19 target-1 kernel: ---[ end trace 14489862ea1e51d4 ]--- Hi Chris, Can you confirm if the RHEL 7.1 based build contains commit 5159d763f..? iscsi/iser-target: Use list_del_init for ->i_conn_node https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5159d763f Another thing that would be helpful is to git diff the RHEL 7.1 tree vs. what's in linux-stable/linux-3.10.y (3.10.69) for drivers/target/iscsi/ + drivers/infiniband/ulp/isert/ code. Any chance to get this ahold of this diff..? --nab -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html