Re: BUG in stress login-logout to multiple IQNs

"Nicholas A. Bellinger" <nab@xxxxxxxxxxxxxxx> · Wed, 25 Feb 2015 23:47:43 -0800

On Wed, 2015-02-25 at 12:42 +0200, Sagi Grimberg wrote:
> On 2/23/2015 10:34 AM, Nicholas A. Bellinger wrote:
> > On Sun, 2015-02-22 at 18:36 +0200, Sagi Grimberg wrote:
> >> On 2/21/2015 9:54 AM, Nicholas A. Bellinger wrote:

<SNIP>

> >> iscsit_take_action_for_connection_exit() is invoked both by RX/TX
> >> threads. But only one should get to iscsit_close_connection() since
> >> conn->connection_exit is set under conn->state_lock. I'd say that if
> >> iscsit_close_connection() was invoked twice, the bug is in
> >> iscsit_take_action_for_connection_exit() isn't it?
> >>
> >
> > Sorry, yes.
> >
> > After looking at this further, I think the previous isert_cq_comp_err()
> > patch still makes sense for the special logout response failure case,
> > but as you've noted it does not address root cause of the original
> > OOPsen.
> >
> > I'm now thinking it's related to complete(conn->conn_logout_comp)
> > happening the start of iscsit_close_connection() (as originally intended
> > for non-iser logout response failure case), that is causing
> > isert_wait4logout() to immediately complete instead of allowing
> > iscsit_logout_post_handler() to perform complete(conn->conn_logout_comp)
> > after completion interrupt -> isert_do_control_comp() happens.
> >
> > This could result in iscsit_release_commands_from_conn() corrupting
> > conn_cmd_list list when attempting to release the logout response
> > before/during iser logout response completion interrupt handling.
> >
> > Here's a quick patch to test the theory.
> >
> > --nab
> >
> > diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
> > index 50bad55..ddbd022 100644
> > --- a/drivers/target/iscsi/iscsi_target.c
> > +++ b/drivers/target/iscsi/iscsi_target.c
> > @@ -4256,11 +4256,12 @@ int iscsit_close_connection(
> >          pr_debug("Closing iSCSI connection CID %hu on SID:"
> >                  " %u\n", conn->cid, sess->sid);
> >          /*
> > -        * Always up conn_logout_comp just in case the RX Thread is sleeping
> > -        * and the logout response never got sent because the connection
> > -        * failed.
> > +        * Always up conn_logout_comp for the traditional TCP case just in case
> > +        * the RX Thread in iscsi_target_rx_opcode() is sleeping and the logout
> > +        * response never got sent because the connection failed.
> >           */
> > -       complete(&conn->conn_logout_comp);
> > +       if (conn->conn_transport->transport_type == ISCSI_TCP)
> > +               complete(&conn->conn_logout_comp);
> >
> >          iscsi_release_thread_set(conn);
> >
> 
> This does seem to make the list corruption go away.

Thanks for the test feedback.

This patch is queued in target-pending/master.

> I increased the
> session count to ~120 doing login/logout loop and at some point I am in
> a point where I have 16066 iscsi_ttx and 16064 iscsi_trx threads
> causing me to fail any other kthread creation (see dump_stack).
> 
> CPU: 12 PID: 22517 Comm: iscsi_ttx Tainted: G            E  3.19.0-rc1+ #34
> Hardware name: Supermicro SYS-1027R-WRF/X9DRW, BIOS 3.0a 08/08/2013
> 0000000000000000 ffff8804469dfdc8 ffffffff8153805c ffff8804469dfe58
> ffff8803a5e05700 ffff8804469dfdf8 ffffffffa053b4fe ffff8803b1e95028
> ffff8803b1e95000 ffff8803b1e95000 ffff8804469dfe58 ffff8804469dfe08
> Call Trace:
> [<ffffffff8153805c>] dump_stack+0x48/0x5c
> [<ffffffffa053b4fe>] iscsi_allocate_thread_sets+0x21e/0x280 
> [iscsi_target_mod]
> [<ffffffffa053b59a>] iscsi_check_to_add_additional_sets+0x3a/0x40 
> [iscsi_target_mod]
> [<ffffffffa053b691>] iscsi_tx_thread_pre_handler+0xf1/0x170 
> [iscsi_target_mod]
> [<ffffffffa054e0a7>] iscsi_target_tx_thread+0x47/0x220 [iscsi_target_mod]
> [<ffffffff81538493>] ? __schedule+0x333/0x620
> [<ffffffffa054e060>] ? iscsit_handle_snack+0x180/0x180 [iscsi_target_mod]
> [<ffffffff8106ac5e>] kthread+0xce/0xf0
> [<ffffffff8106ab90>] ? kthread_freezable_should_stop+0x70/0x70
> [<ffffffff8153beec>] ret_from_fork+0x7c/0xb0
> [<ffffffff8106ab90>] ? kthread_freezable_should_stop+0x70/0x70
> Unable to start iscsi_target_tx_thread
> 
> For some reason the iscsi extra thread sets are not cleaned up well
> and/or not reused from inactive list...
> 

Please revert commit 72859d91, as it's incorrect per your earlier
comments wrt iscsit_close_connection() never being called more than once
during explicit shutdown.

The same has been done in target-pending/master here:

https://git.kernel.org/cgit/linux/kernel/git/nab/target-pending.git/commit/?id=796cdd654e2b12010f3443aa039e4385906eacad

Apologies, for not being explicit about this earlier.

--nab 

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html