On Thu, 2015-02-26 at 14:28 +0200, Sagi Grimberg wrote: > On 2/26/2015 9:47 AM, Nicholas A. Bellinger wrote: > > On Wed, 2015-02-25 at 12:42 +0200, Sagi Grimberg wrote: > >> On 2/23/2015 10:34 AM, Nicholas A. Bellinger wrote: > >>> On Sun, 2015-02-22 at 18:36 +0200, Sagi Grimberg wrote: > >>>> On 2/21/2015 9:54 AM, Nicholas A. Bellinger wrote: > > > > <SNIP> > > > >>>> iscsit_take_action_for_connection_exit() is invoked both by RX/TX > >>>> threads. But only one should get to iscsit_close_connection() since > >>>> conn->connection_exit is set under conn->state_lock. I'd say that if > >>>> iscsit_close_connection() was invoked twice, the bug is in > >>>> iscsit_take_action_for_connection_exit() isn't it? > >>>> > >>> > >>> Sorry, yes. > >>> > >>> After looking at this further, I think the previous isert_cq_comp_err() > >>> patch still makes sense for the special logout response failure case, > >>> but as you've noted it does not address root cause of the original > >>> OOPsen. > >>> > >>> I'm now thinking it's related to complete(conn->conn_logout_comp) > >>> happening the start of iscsit_close_connection() (as originally intended > >>> for non-iser logout response failure case), that is causing > >>> isert_wait4logout() to immediately complete instead of allowing > >>> iscsit_logout_post_handler() to perform complete(conn->conn_logout_comp) > >>> after completion interrupt -> isert_do_control_comp() happens. > >>> > >>> This could result in iscsit_release_commands_from_conn() corrupting > >>> conn_cmd_list list when attempting to release the logout response > >>> before/during iser logout response completion interrupt handling. > >>> > >>> Here's a quick patch to test the theory. > >>> > >>> --nab > >>> > >>> diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c > >>> index 50bad55..ddbd022 100644 > >>> --- a/drivers/target/iscsi/iscsi_target.c > >>> +++ b/drivers/target/iscsi/iscsi_target.c > >>> @@ -4256,11 +4256,12 @@ int iscsit_close_connection( > >>> pr_debug("Closing iSCSI connection CID %hu on SID:" > >>> " %u\n", conn->cid, sess->sid); > >>> /* > >>> - * Always up conn_logout_comp just in case the RX Thread is sleeping > >>> - * and the logout response never got sent because the connection > >>> - * failed. > >>> + * Always up conn_logout_comp for the traditional TCP case just in case > >>> + * the RX Thread in iscsi_target_rx_opcode() is sleeping and the logout > >>> + * response never got sent because the connection failed. > >>> */ > >>> - complete(&conn->conn_logout_comp); > >>> + if (conn->conn_transport->transport_type == ISCSI_TCP) > >>> + complete(&conn->conn_logout_comp); > >>> > >>> iscsi_release_thread_set(conn); > >>> > >> > >> This does seem to make the list corruption go away. > > > > Thanks for the test feedback. > > > > This patch is queued in target-pending/master. > > > >> I increased the > >> session count to ~120 doing login/logout loop and at some point I am in > >> a point where I have 16066 iscsi_ttx and 16064 iscsi_trx threads > >> causing me to fail any other kthread creation (see dump_stack). > >> > >> CPU: 12 PID: 22517 Comm: iscsi_ttx Tainted: G E 3.19.0-rc1+ #34 > >> Hardware name: Supermicro SYS-1027R-WRF/X9DRW, BIOS 3.0a 08/08/2013 > >> 0000000000000000 ffff8804469dfdc8 ffffffff8153805c ffff8804469dfe58 > >> ffff8803a5e05700 ffff8804469dfdf8 ffffffffa053b4fe ffff8803b1e95028 > >> ffff8803b1e95000 ffff8803b1e95000 ffff8804469dfe58 ffff8804469dfe08 > >> Call Trace: > >> [<ffffffff8153805c>] dump_stack+0x48/0x5c > >> [<ffffffffa053b4fe>] iscsi_allocate_thread_sets+0x21e/0x280 > >> [iscsi_target_mod] > >> [<ffffffffa053b59a>] iscsi_check_to_add_additional_sets+0x3a/0x40 > >> [iscsi_target_mod] > >> [<ffffffffa053b691>] iscsi_tx_thread_pre_handler+0xf1/0x170 > >> [iscsi_target_mod] > >> [<ffffffffa054e0a7>] iscsi_target_tx_thread+0x47/0x220 [iscsi_target_mod] > >> [<ffffffff81538493>] ? __schedule+0x333/0x620 > >> [<ffffffffa054e060>] ? iscsit_handle_snack+0x180/0x180 [iscsi_target_mod] > >> [<ffffffff8106ac5e>] kthread+0xce/0xf0 > >> [<ffffffff8106ab90>] ? kthread_freezable_should_stop+0x70/0x70 > >> [<ffffffff8153beec>] ret_from_fork+0x7c/0xb0 > >> [<ffffffff8106ab90>] ? kthread_freezable_should_stop+0x70/0x70 > >> Unable to start iscsi_target_tx_thread > >> > >> For some reason the iscsi extra thread sets are not cleaned up well > >> and/or not reused from inactive list... > >> > > > > Please revert commit 72859d91, as it's incorrect per your earlier > > comments wrt iscsit_close_connection() never being called more than once > > during explicit shutdown. > > This reproduces with commit 72859d91 reverted. As per the off-list discussion, here's the long overdue patch to avoid thread_sets all together, and just perform kthread creation + deletion at login + shutdown time instead re-using existing kthreads across different connections. https://git.kernel.org/cgit/linux/kernel/git/nab/target-pending.git/commit/?id=7a7625e6ecbfc60826fb96a75dc2891b7145176e So far it's surviving initial login/logout + /etc/init.d/target restart tests with ISCSI_TCP. Please have a look. Thank you, --nab -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html