Re: Have you seen this stack trace in isert?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 2015-02-22 at 18:20 +0200, Sagi Grimberg wrote:
> On 2/19/2015 9:23 PM, Chris Moore wrote:
> > We're running RHEL 7.1 kernel plus the patches that Sagi helped port for me.  On the target we're getting the stack trace below (list_del corruption).
> >
> > Leading up to the stack trace we get a bunch of:
> > Feb 10 16:47:28 target-1 kernel: ib_post_send() failed for IB_WR_RDMA_READ
> > Feb 10 16:47:28 target-1 kernel: ib_post_send failed with -12
> >
> 
> Hey Chris,
> 
> It seems that you ran out of SQ space. Does the rhel7.1 code include
> the completion coalescing code?
> 
> Can you tell us the scenario you ran to invoke this bug?
> 
> The trace below indicate some form of use-after-free where we try to
> delete cmd->i_conn_node.
> 
> > Followed by a bunch of aborts:
> > Feb 10 16:48:30 target-1 kernel: ABORT_TASK: Found referenced iSCSI task_tag: 26
> > Feb 10 16:48:30 target-1 kernel: ABORT_TASK: ref_tag: 26 already complete, skipping
> > Feb 10 16:48:30 target-1 kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 26
> >
> > I'm working on tracking down the cause of the ib_post_send failure.  Has anyone seen the list_del corruption issue?
> 
> I am also seeing these kind of corruptions on rhel7.0 when working
> against multiple initiators (~60 sessions) doing stress logins/logouts.
> It doesn't look good...
> 
> >
> >
> > Feb 10 16:50:19 target-1 kernel: ------------[ cut here ]------------
> > Feb 10 16:50:19 target-1 kernel: WARNING: at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0()
> > Feb 10 16:50:19 target-1 kernel: list_del corruption. prev->next should be ffff8800c10b5040, but was ffff8802f35fb640
> > Feb 10 16:50:19 target-1 kernel: Modules linked in: tcp_lp target_core_pscsi target_core_file target_core_iblock 8021q garp mrp bnep bluetooth rfkill fuse xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT iptable_filter ip_tables tun bridge stp llc xprtrdma sunrpc ib_isert(F) iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad iTCO_wdt iTCO_vendor_support ocrdma ib_core ib_addr lpc_ich serio_raw shpchp wmi mfd_core dm_mirror dm_region_hash dm_log dm_mod intel_powerclamp ipmi_devintf coretemp pcspkr kvm_intel kvm dcdbas crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel
> > Feb 10 16:50:19 target-1 kernel: aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ipmi_si ipmi_msghandler acpi_power_meter uinput i7core_edac edac_core ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_common sr_mod cdrom ata_generic pata_acpi mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ttm drm mptsas ata_piix be2net scsi_transport_sas libata i2c_core mptscsih vxlan mptbase ip_tunnel bnx2
> > Feb 10 16:50:19 target-1 kernel: CPU: 12 PID: 25240 Comm: kworker/12:2 Tainted: GF         I --------------   3.10.0-224.el7.x86_64 #1
> > Feb 10 16:50:19 target-1 kernel: Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS 6.3.0 07/24/2012
> > Feb 10 16:50:19 target-1 kernel: Workqueue: isert_comp_wq isert_do_control_comp [ib_isert]
> > Feb 10 16:50:19 target-1 kernel: ffff8805f8f57d38 00000000f4d510fb ffff8805f8f57cf0 ffffffff81603f46
> > Feb 10 16:50:19 target-1 kernel: ffff8805f8f57d28 ffffffff8106e28b ffff8800c10b4e00 ffff8800c10b5040
> > Feb 10 16:50:19 target-1 kernel: 0000000000000000 ffff8800c10b4e40 ffff88031e04aee8 ffff8805f8f57d90
> > Feb 10 16:50:19 target-1 kernel: Call Trace:
> > Feb 10 16:50:19 target-1 kernel: [<ffffffff81603f46>] dump_stack+0x19/0x1b
> > Feb 10 16:50:19 target-1 kernel: [<ffffffff8106e28b>] warn_slowpath_common+0x6b/0xb0
> > Feb 10 16:50:19 target-1 kernel: [<ffffffff8106e32c>] warn_slowpath_fmt+0x5c/0x80
> > Feb 10 16:50:19 target-1 kernel: [<ffffffff812ed571>] __list_del_entry+0xa1/0xd0
> > Feb 10 16:50:19 target-1 kernel: [<ffffffffa05a757f>] isert_completion_put+0x26f/0x3f0 [ib_isert]
> > Feb 10 16:50:19 target-1 kernel: [<ffffffffa05a77cd>] isert_do_control_comp+0xcd/0x1b0 [ib_isert]
> > Feb 10 16:50:19 target-1 kernel: [<ffffffff8108f0ab>] process_one_work+0x17b/0x470
> > Feb 10 16:50:19 target-1 kernel: [<ffffffff8108fe8b>] worker_thread+0x11b/0x400
> > Feb 10 16:50:19 target-1 kernel: [<ffffffff8108fd70>] ? rescuer_thread+0x400/0x400
> > Feb 10 16:50:19 target-1 kernel: [<ffffffff8109726f>] kthread+0xcf/0xe0
> > Feb 10 16:50:19 target-1 kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
> > Feb 10 16:50:19 target-1 kernel: [<ffffffff81613d3c>] ret_from_fork+0x7c/0xb0
> > Feb 10 16:50:19 target-1 kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
> > Feb 10 16:50:19 target-1 kernel: ---[ end trace 14489862ea1e51d4 ]---

Hi Chris,

Can you confirm if the RHEL 7.1 based build contains commit 5159d763f..?

iscsi/iser-target: Use list_del_init for ->i_conn_node
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5159d763f

Another thing that would be helpful is to git diff the RHEL 7.1 tree vs.
what's in linux-stable/linux-3.10.y (3.10.69) for drivers/target/iscsi/
+ drivers/infiniband/ulp/isert/ code.

Any chance to get this ahold of this diff..?

--nab

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux