Re: BUG in stress login-logout to multiple IQNs

"Nicholas A. Bellinger" <nab@xxxxxxxxxxxxxxx> · Fri, 27 Feb 2015 02:49:51 -0800

On Thu, 2015-02-26 at 14:28 +0200, Sagi Grimberg wrote:
> On 2/26/2015 9:47 AM, Nicholas A. Bellinger wrote:
> > On Wed, 2015-02-25 at 12:42 +0200, Sagi Grimberg wrote:
> >> On 2/23/2015 10:34 AM, Nicholas A. Bellinger wrote:
> >>> On Sun, 2015-02-22 at 18:36 +0200, Sagi Grimberg wrote:
> >>>> On 2/21/2015 9:54 AM, Nicholas A. Bellinger wrote:
> >
> > <SNIP>
> >
> >>>> iscsit_take_action_for_connection_exit() is invoked both by RX/TX
> >>>> threads. But only one should get to iscsit_close_connection() since
> >>>> conn->connection_exit is set under conn->state_lock. I'd say that if
> >>>> iscsit_close_connection() was invoked twice, the bug is in
> >>>> iscsit_take_action_for_connection_exit() isn't it?
> >>>>
> >>>
> >>> Sorry, yes.
> >>>
> >>> After looking at this further, I think the previous isert_cq_comp_err()
> >>> patch still makes sense for the special logout response failure case,
> >>> but as you've noted it does not address root cause of the original
> >>> OOPsen.
> >>>
> >>> I'm now thinking it's related to complete(conn->conn_logout_comp)
> >>> happening the start of iscsit_close_connection() (as originally intended
> >>> for non-iser logout response failure case), that is causing
> >>> isert_wait4logout() to immediately complete instead of allowing
> >>> iscsit_logout_post_handler() to perform complete(conn->conn_logout_comp)
> >>> after completion interrupt -> isert_do_control_comp() happens.
> >>>
> >>> This could result in iscsit_release_commands_from_conn() corrupting
> >>> conn_cmd_list list when attempting to release the logout response
> >>> before/during iser logout response completion interrupt handling.
> >>>
> >>> Here's a quick patch to test the theory.
> >>>
> >>> --nab
> >>>
> >>> diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
> >>> index 50bad55..ddbd022 100644
> >>> --- a/drivers/target/iscsi/iscsi_target.c
> >>> +++ b/drivers/target/iscsi/iscsi_target.c
> >>> @@ -4256,11 +4256,12 @@ int iscsit_close_connection(
> >>>           pr_debug("Closing iSCSI connection CID %hu on SID:"
> >>>                   " %u\n", conn->cid, sess->sid);
> >>>           /*
> >>> -        * Always up conn_logout_comp just in case the RX Thread is sleeping
> >>> -        * and the logout response never got sent because the connection
> >>> -        * failed.
> >>> +        * Always up conn_logout_comp for the traditional TCP case just in case
> >>> +        * the RX Thread in iscsi_target_rx_opcode() is sleeping and the logout
> >>> +        * response never got sent because the connection failed.
> >>>            */
> >>> -       complete(&conn->conn_logout_comp);
> >>> +       if (conn->conn_transport->transport_type == ISCSI_TCP)
> >>> +               complete(&conn->conn_logout_comp);
> >>>
> >>>           iscsi_release_thread_set(conn);
> >>>
> >>
> >> This does seem to make the list corruption go away.
> >
> > Thanks for the test feedback.
> >
> > This patch is queued in target-pending/master.
> >
> >> I increased the
> >> session count to ~120 doing login/logout loop and at some point I am in
> >> a point where I have 16066 iscsi_ttx and 16064 iscsi_trx threads
> >> causing me to fail any other kthread creation (see dump_stack).
> >>
> >> CPU: 12 PID: 22517 Comm: iscsi_ttx Tainted: G            E  3.19.0-rc1+ #34
> >> Hardware name: Supermicro SYS-1027R-WRF/X9DRW, BIOS 3.0a 08/08/2013
> >> 0000000000000000 ffff8804469dfdc8 ffffffff8153805c ffff8804469dfe58
> >> ffff8803a5e05700 ffff8804469dfdf8 ffffffffa053b4fe ffff8803b1e95028
> >> ffff8803b1e95000 ffff8803b1e95000 ffff8804469dfe58 ffff8804469dfe08
> >> Call Trace:
> >> [<ffffffff8153805c>] dump_stack+0x48/0x5c
> >> [<ffffffffa053b4fe>] iscsi_allocate_thread_sets+0x21e/0x280
> >> [iscsi_target_mod]
> >> [<ffffffffa053b59a>] iscsi_check_to_add_additional_sets+0x3a/0x40
> >> [iscsi_target_mod]
> >> [<ffffffffa053b691>] iscsi_tx_thread_pre_handler+0xf1/0x170
> >> [iscsi_target_mod]
> >> [<ffffffffa054e0a7>] iscsi_target_tx_thread+0x47/0x220 [iscsi_target_mod]
> >> [<ffffffff81538493>] ? __schedule+0x333/0x620
> >> [<ffffffffa054e060>] ? iscsit_handle_snack+0x180/0x180 [iscsi_target_mod]
> >> [<ffffffff8106ac5e>] kthread+0xce/0xf0
> >> [<ffffffff8106ab90>] ? kthread_freezable_should_stop+0x70/0x70
> >> [<ffffffff8153beec>] ret_from_fork+0x7c/0xb0
> >> [<ffffffff8106ab90>] ? kthread_freezable_should_stop+0x70/0x70
> >> Unable to start iscsi_target_tx_thread
> >>
> >> For some reason the iscsi extra thread sets are not cleaned up well
> >> and/or not reused from inactive list...
> >>
> >
> > Please revert commit 72859d91, as it's incorrect per your earlier
> > comments wrt iscsit_close_connection() never being called more than once
> > during explicit shutdown.
> 
> This reproduces with commit 72859d91 reverted.

As per the off-list discussion, here's the long overdue patch to avoid
thread_sets all together, and just perform kthread creation + deletion
at login + shutdown time instead re-using existing kthreads across
different connections.

https://git.kernel.org/cgit/linux/kernel/git/nab/target-pending.git/commit/?id=7a7625e6ecbfc60826fb96a75dc2891b7145176e

So far it's surviving initial login/logout + /etc/init.d/target restart
tests with ISCSI_TCP.

Please have a look.

Thank you,

--nab

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html