RE: 3.12.5 Target Errors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: target-devel-owner@xxxxxxxxxxxxxxx [mailto:target-devel-
> owner@xxxxxxxxxxxxxxx] On Behalf Of Sagi Grimberg
> Sent: Thursday, May 15, 2014 9:50 AM
> To: Moussa Ba (moussaba); target-devel@xxxxxxxxxxxxxxx
> Cc: Nicholas Bellinger
> Subject: Re: 3.12.5 Target Errors
> 
> On 5/15/2014 7:35 PM, Moussa Ba (moussaba) wrote:
> > We are deploying a test environment in house and received complaints
> of failures while installing OSes on target LUNs.  Looking at the
> target machine, we observed the following errors in dmesg, these are
> repeated.  Is this a known issue? has it been fixed?
> 
> Hey Moussa,
> 
> I can say that since kernel 3.12.5 ib_isert was added with some
> important stability fixes.
> 
> The below list corruption seems to originate in the TX coalescing work
> that was done by Nic.
> 
> Does your kernel have the below commit applied? (although I don't know
> if that went to 3.12 stable kernels...)
> commit ebbe442183b7b8192c963266f1c89048fefc63a5
> Author: Nicholas Bellinger <nab@xxxxxxxxxxxxxxx>
> Date:   Sun Mar 2 14:51:12 2014 -0800
> 
>      iser-target: Fix command leak for tx_desc->comp_llnode_batch
> 
>      This patch addresses a number of active I/O shutdown issues
>      related to isert_cmd descriptors being leaked that are part
>      of a completion interrupt coalescing batch.
> 
>      This includes adding logic in isert_cq_tx_comp_err() to
>      drain any associated tx_desc->comp_llnode_batch, as well
>      as isert_cq_drain_comp_llist() to drain any associated
>      isert_conn->conn_comp_llist.
> 
>      Also, set tx_desc->llnode_active in isert_init_send_wr()
>      in order to determine when work requests need to be skipped
>      in isert_cq_tx_work() exception path code.
> 
>      Finally, update isert_init_send_wr() to only allow interrupt
>      coalescing when ISER_CONN_UP.
> 
>      Acked-by: Sagi Grimberg <sagig@xxxxxxxxxxxx>
>      Cc: Or Gerlitz <ogerlitz@xxxxxxxxxxxx>
>      Cc: <stable@xxxxxxxxxxxxxxx> #3.13+
>      Signed-off-by: Nicholas Bellinger <nab@xxxxxxxxxxxxxxx>
> 
> Moreover, a more detailed scenario may also help...
> 
> Cheers,
> Sagi.

The patch above was not applied.  The setup consists of a 3.12.5 target machines with 32 LUNS defined. These are all backed by PCIe SSDs.  Initiators are vmware v5.5 using the latest Ethernet ISER drivers.  LUNs are mapped to datastores.  In one instance we have received reports of targets simply disappearing on ESX. We don't yet have dmesg output for those reports. In the instance I reported however, the user was going through an installation process of a CentOS VM that never completed. Checking on the target I observed the errors I attached earlier.  

Which stable version can you recommend as including the most recent stability fixes?  We have 4 target systems deployed, and I suspect this issue will manifests itself on all 4 hence my desire to resolve it  quickly. Thank you.

Moussa
> 
> > Moussa
> >
> > [190660.362733] Unknown RDMA CMA event: 15
> > [190668.946341] Unknown RDMA CMA event: 15
> > [190676.147304] iSCSI Login timeout on Network Portal
> 192.168.252.101:3269
> > [190676.147365] iSCSI Login negotiation failed.
> > [190676.160321] Unknown RDMA CMA event: 8
> > [190691.172622] iSCSI Login timeout on Network Portal
> 192.168.252.101:3269
> > [190691.172685] iSCSI Login negotiation failed.
> > [190711.622169] ------------[ cut here ]------------
> > [190711.622190] WARNING: CPU: 0 PID: 6780 at lib/list_debug.c:53
> __list_del_entry+0x63/0xd0()
> > [190711.622192] list_del corruption, ffff880444de29d0->next is
> LIST_POISON1 (dead000000100100)
> > [190711.622192] Modules linked in: ib_srpt tcm_qla2xxx qla2xxx
> tcm_loop tcm_fc libfc scsi_transport_fc scsi_tgt ib_isert rdma_cm
> > iw_cm ib_addr iscsi_target_mod target_core_pscsi target_core_file
> target_core_iblock target_core_mod ebtable_nat ebtables configf
> > s ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4
> nf_defrag_ipv4 ipt_REJECT xt_CHECKSUM iptable_mangle iptable_fi
> > lter ip_tables bridge stp llc autofs4 sunrpc ip6t_REJECT
> nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter
> i
> > p6_tables ipv6 ib_ipoib ib_cm ib_uverbs ib_umad mlx4_en mlx4_ib ib_sa
> ib_mad ib_core mlx4_core dm_mirror dm_region_hash dm_log dm
> > _mod vhost_net macvtap macvlan vhost tun kvm_intel kvm uinput
> iTCO_wdt iTCO_vendor_support microcode serio_raw pcspkr sb_edac eda
> > c_core i2c_i801 sg lpc_ich mfd_core igb i2c_algo_bit i2c_core ptp
> pps_core mtip32xx(OF) ioatdma dca wmi ext3(F) jbd(F) mbcache(F)
> >   sd_mod(F) crc_t10dif(F) crct10dif_common(F) ahci(F) libahci(F)
> isci(F) libsas(F) scsi_transport_sas(F) [last unloaded: speedstep
> > _lib]
> > [190711.622252] CPU: 0 PID: 6780 Comm: kworker/u33:5 Tainted: GF
> O 3.12.5+ #1
> > [190711.622254] Hardware name: Supermicro X9DRX+-F/X9DRX+-F, BIOS
> 3.00 07/09/2013
> > [190711.622258] Workqueue: isert_comp_wq isert_cq_tx_work [ib_isert]
> > [190711.622260]  0000000000000035 ffff88083afb5c28 ffffffff81552ef7
> 0000000000000035
> > [190711.622263]  ffff88083afb5c78 ffff88083afb5c68 ffffffff8104d1ac
> ffff88083afb5c78
> > [190711.622265]  ffff880444de29d0 ffff880444de2c90 ffff88045f608000
> ffff8804581773e0
> > [190711.622267] Call Trace:
> > [190711.622272]  [<ffffffff81552ef7>] dump_stack+0x49/0x62
> > [190711.622275]  [<ffffffff8104d1ac>] warn_slowpath_common+0x8c/0xc0
> > [190711.622277]  [<ffffffff8104d296>] warn_slowpath_fmt+0x46/0x50
> > [190711.622279]  [<ffffffff8127fe13>] __list_del_entry+0x63/0xd0
> > [190711.622281]  [<ffffffff8127fe91>] list_del+0x11/0x40
> > [190711.622284]  [<ffffffffa076bcf4>] isert_put_cmd+0xb4/0x1e0
> [ib_isert]
> > [190711.622287]  [<ffffffffa076eede>] isert_completion_put+0x6e/0xe0
> [ib_isert]
> > [190711.622290]  [<ffffffffa076f11c>] isert_cq_comp_err+0x2c/0xd0
> [ib_isert]
> > [190711.622292]  [<ffffffffa076f5c9>] isert_cq_tx_work+0x89/0x110
> [ib_isert]
> > [190711.622295]  [<ffffffff81068553>] process_one_work+0x183/0x490
> > [190711.622297]  [<ffffffff81069a2f>] worker_thread+0x11f/0x3a0
> > [190711.622299]  [<ffffffff81069910>] ? manage_workers+0x160/0x160
> > [190711.622302]  [<ffffffff8106f94e>] kthread+0xce/0xe0
> > [190711.622304]  [<ffffffff8106f880>] ?
> kthread_freezable_should_stop+0x70/0x70
> > [190711.622312]  [<ffffffff8155f66c>] ret_from_fork+0x7c/0xb0
> > [190711.622314]  [<ffffffff8106f880>] ?
> kthread_freezable_should_stop+0x70/0x70
> > [190711.622315] ---[ end trace b0c0a2e3b5c1820b ]---
> > [190711.622318] ------------[ cut here ]------------
> > [190711.622320] WARNING: CPU: 0 PID: 6780 at lib/list_debug.c:53
> __list_del_entry+0x63/0xd0()
> > [190711.622321] list_del corruption, ffff880444ddca70->next is
> LIST_POISON1 (dead000000100100)
> > [190711.622321] Modules linked in: ib_srpt tcm_qla2xxx qla2xxx
> tcm_loop tcm_fc libfc scsi_transport_fc scsi_tgt ib_isert rdma_cm
> > iw_cm ib_addr iscsi_target_mod target_core_pscsi target_core_file
> target_core_iblock target_core_mod ebtable_nat ebtables configf
> > s ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4
> nf_defrag_ipv4 ipt_REJECT xt_CHECKSUM iptable_mangle iptable_fi
> > lter ip_tables bridge stp llc autofs4 sunrpc ip6t_REJECT
> nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter
> i
> > p6_tables ipv6 ib_ipoib ib_cm ib_uverbs ib_umad mlx4_en mlx4_ib ib_sa
> ib_mad ib_core mlx4_core dm_mirror dm_region_hash dm_log dm
> > _mod vhost_net macvtap macvlan vhost tun kvm_intel kvm uinput
> iTCO_wdt iTCO_vendor_support microcode serio_raw pcspkr sb_edac eda
> > c_core i2c_i801 sg lpc_ich mfd_core igb i2c_algo_bit i2c_core ptp
> pps_core mtip32xx(OF) ioatdma dca wmi ext3(F) jbd(F) mbcache(F)
> >   sd_mod(F) crc_t10dif(F) crct10dif_common(F) ahci(F) libahci(F)
> isci(F) libsas(F) scsi_transport_sas(F) [last unloaded: speedstep
> > _lib]
> > [190711.622353] CPU: 0 PID: 6780 Comm: kworker/u33:5 Tainted: GF
> W  O 3.12.5+ #1
> > [190711.622354] Hardware name: Supermicro X9DRX+-F/X9DRX+-F, BIOS
> 3.00 07/09/2013
> > [190711.622356] Workqueue: isert_comp_wq isert_cq_tx_work [ib_isert]
> > [190711.622357]  0000000000000035 ffff88083afb5c28 ffffffff81552ef7
> 0000000000000035
> > [190711.622359]  ffff88083afb5c78 ffff88083afb5c68 ffffffff8104d1ac
> ffff88083afb5c78
> > [190711.622361]  ffff880444ddca70 ffff880444ddcd30 ffff88045f608000
> ffff8804581773e0
> > [190711.622363] Call Trace:
> > [190711.622365]  [<ffffffff81552ef7>] dump_stack+0x49/0x62
> > [190711.622367]  [<ffffffff8104d1ac>] warn_slowpath_common+0x8c/0xc0
> > [190711.622368]  [<ffffffff8104d296>] warn_slowpath_fmt+0x46/0x50
> > [190711.622371]  [<ffffffff8127fe13>] __list_del_entry+0x63/0xd0
> > [190711.622373]  [<ffffffff8127fe91>] list_del+0x11/0x40
> > [190711.622375]  [<ffffffffa076bcf4>] isert_put_cmd+0xb4/0x1e0
> [ib_isert]
> > [190711.622377]  [<ffffffffa076eede>] isert_completion_put+0x6e/0xe0
> [ib_isert]
> > [190711.622380]  [<ffffffffa076f11c>] isert_cq_comp_err+0x2c/0xd0
> [ib_isert]
> > [190711.622382]  [<ffffffffa076f5c9>] isert_cq_tx_work+0x89/0x110
> [ib_isert]
> > [190711.622384]  [<ffffffff81068553>] process_one_work+0x183/0x490
> > [190711.622386]  [<ffffffff81069a2f>] worker_thread+0x11f/0x3a0
> > [190711.622387]  [<ffffffff81069910>] ? manage_workers+0x160/0x160
> > [190711.622389]  [<ffffffff8106f94e>] kthread+0xce/0xe0
> > [190711.622391]  [<ffffffff8106f880>] ?
> kthread_freezable_should_stop+0x70/0x70
> > [190711.622393]  [<ffffffff8155f66c>] ret_from_fork+0x7c/0xb0
> > [190711.622395]  [<ffffffff8106f880>] ?
> kthread_freezable_should_stop+0x70/0x70
> > [190711.622396] ---[ end trace b0c0a2e3b5c1820c ]---
> > [190711.622398] ------------[ cut here ]------------
> > [190711.622400] WARNING: CPU: 0 PID: 6780 at lib/list_debug.c:53
> __list_del_entry+0x63/0xd0()
> > [190711.622401] list_del corruption, ffff880444dc5a90->next is
> LIST_POISON1 (dead000000100100)
> > [190711.622402] Modules linked in: ib_srpt tcm_qla2xxx qla2xxx
> tcm_loop tcm_fc libfc scsi_transport_fc scsi_tgt ib_isert rdma_cm
> > iw_cm ib_addr iscsi_target_mod target_core_pscsi target_core_file
> target_core_iblock target_core_mod ebtable_nat ebtables configf
> > s ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4
> nf_defrag_ipv4 ipt_REJECT xt_CHECKSUM iptable_mangle iptable_fi
> > lter ip_tables bridge stp llc autofs4 sunrpc ip6t_REJECT
> nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter
> i
> > p6_tables ipv6 ib_ipoib ib_cm ib_uverbs ib_umad mlx4_en mlx4_ib ib_sa
> ib_mad ib_core mlx4_core dm_mirror dm_region_hash dm_log dm
> > _mod vhost_net macvtap macvlan vhost tun kvm_intel kvm uinput
> iTCO_wdt iTCO_vendor_support microcode serio_raw pcspkr sb_edac eda
> > c_core i2c_i801 sg lpc_ich mfd_core igb i2c_algo_bit i2c_core ptp
> pps_core mtip32xx(OF) ioatdma dca wmi ext3(F) jbd(F) mbcache(F)
> >   sd_mod(F) crc_t10dif(F) crct10dif_common(F) ahci(F) libahci(F)
> isci(F) libsas(F) scsi_transport_sas(F) [last unloaded: speedstep
> > _lib]
> > [190711.622433] CPU: 0 PID: 6780 Comm: kworker/u33:5 Tainted: GF
> W  O 3.12.5+ #1
> > [190711.622433] Hardware name: Supermicro X9DRX+-F/X9DRX+-F, BIOS
> 3.00 07/09/2013
> > [190711.622435] Workqueue: isert_comp_wq isert_cq_tx_work [ib_isert]
> > [190711.622436]  0000000000000035 ffff88083afb5c28 ffffffff81552ef7
> 0000000000000035
> > [190711.622438]  ffff88083afb5c78 ffff88083afb5c68 ffffffff8104d1ac
> ffff88083afb5c78
> > [190711.622440]  ffff880444dc5a90 ffff880444dc5d50 ffff88045f608000
> ffff8804581773e0
> > [190711.622442] Call Trace:
> > [190711.622444]  [<ffffffff81552ef7>] dump_stack+0x49/0x62
> > [190711.622446]  [<ffffffff8104d1ac>] warn_slowpath_common+0x8c/0xc0
> > [190711.622448]  [<ffffffff8104d296>] warn_slowpath_fmt+0x46/0x50
> > [190711.622450]  [<ffffffff8127fe13>] __list_del_entry+0x63/0xd0
> > [190711.622452]  [<ffffffff8127fe91>] list_del+0x11/0x40
> > [190711.622454]  [<ffffffffa076bcf4>] isert_put_cmd+0xb4/0x1e0
> [ib_isert]
> > [190711.622456]  [<ffffffffa076eede>] isert_completion_put+0x6e/0xe0
> [ib_isert]
> > [190711.622459]  [<ffffffffa076f11c>] isert_cq_comp_err+0x2c/0xd0
> [ib_isert]
> > [190711.622461]  [<ffffffffa076f5c9>] isert_cq_tx_work+0x89/0x110
> [ib_isert]
> > [190711.622463]  [<ffffffff81068553>] process_one_work+0x183/0x490
> > [190711.622465]  [<ffffffff81069a2f>] worker_thread+0x11f/0x3a0
> > [190711.622466]  [<ffffffff81069910>] ? manage_workers+0x160/0x160
> > [190711.622468]  [<ffffffff8106f94e>] kthread+0xce/0xe0
> > [190711.622470]  [<ffffffff8106f880>] ?
> kthread_freezable_should_stop+0x70/0x70
> > [190711.622472]  [<ffffffff8155f66c>] ret_from_fork+0x7c/0xb0
> > [190711.622474]  [<ffffffff8106f880>] ?
> kthread_freezable_should_stop+0x70/0x70
> > [190711.622475] ---[ end trace b0c0a2e3b5c1820d ]---
> > [190711.622477] ------------[ cut here ]------------
> > --
> > To unsubscribe from this list: send the line "unsubscribe target-
> devel" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe target-devel"
> in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux