Re: Kernel crash with target-pending/for-next

"Nicholas A. Bellinger" <nab@xxxxxxxxxxxxxxx> · Fri, 02 Jun 2017 21:19:37 -0700

On Fri, 2017-06-02 at 16:30 +0000, Bart Van Assche wrote:
> On Thu, 2017-06-01 at 20:19 -0700, Nicholas A. Bellinger wrote:
> > Here's the updated version to restore original behavior for se_node_acl
> > delete, but still avoid the endless loop with the iscsi-target specific
> > case where se_node_acl->queue_depth changes.
> > 
> > Care to verify on ib_srpt, or just a report and never confirm..?
> 
> Hello Nic,
> 
> This is what I ran into with commit 4f61e1e687c4 ("target: Avoid
> target_shutdown_sessions loop during queue_depth change") merged with kernel
> v4.12-rc3. This is a crash I had never seen before. This crash disappears if
> I revert commit 4f61e1e687c4 so I think this indicates a bug introduced by
> that commit:
> 

Well, commit 4f61e1e687c4 does not change the original behavior to drain
the list of active se_node_acl sessions:

diff --git a/drivers/target/target_core_tpg.c b/drivers/target/target_core_tpg.c
index 3691373..1b2b60e 100644
--- a/drivers/target/target_core_tpg.c
+++ b/drivers/target/target_core_tpg.c
@@ -336,14 +336,14 @@ struct se_node_acl *core_tpg_add_initiator_node_acl(
        return acl;
 }
 
-static void target_shutdown_sessions(struct se_node_acl *acl)
+static void target_shutdown_sessions(struct se_node_acl *acl, bool do_restart)
 {
-       struct se_session *sess;
+       struct se_session *sess, *sess_tmp;
        unsigned long flags;
 
 restart:
        spin_lock_irqsave(&acl->nacl_sess_lock, flags);
-       list_for_each_entry(sess, &acl->acl_sess_list, sess_acl_list) {
+       list_for_each_entry_safe(sess, sess_tmp, &acl->acl_sess_list, sess_acl_list) {
                if (sess->sess_tearing_down)
                        continue;
 
@@ -352,7 +352,11 @@ static void target_shutdown_sessions(struct se_node_acl *acl)
 
                if (acl->se_tpg->se_tpg_tfo->close_session)
                        acl->se_tpg->se_tpg_tfo->close_session(sess);
-               goto restart;
+
+               if (do_restart)
+                       goto restart;
+
+               spin_lock_irqsave(&acl->nacl_sess_lock, flags);
        }
        spin_unlock_irqrestore(&acl->nacl_sess_lock, flags);
 }

That is, it's doing the same thing as before in
target_shutdown_sessions() walking se_node_acl->acl_sess_list, invoking
->close_session(), and immediately restarting the list walk after each
one.

How can this mean srpt..?

> ib_srpt:srpt_close_ch: ib_srpt 0x0000000000000000e41d2d03000a6d51-1114: queued zerolength write
> ib_srpt:srpt_release_channel_work: ib_srpt srpt_release_channel_work: 0x0000000000000000e41d2d03000a6d51-1114; release_done =           (null)
> ------------[ cut here ]------------
> kernel BUG at drivers/infiniband/ulp/srpt/ib_srpt.c:2770!

Btw, looking at v4.12-rc3 there is not a BUG_ON() at line 2770.

Perhaps BUG_ON(ch->release_done) at line 2719, which could indicate
srpt_close_session() is being called twice...

But if it is, then why isn't srpt_close_session() pr_debug shown
anywhere in your output..?

Can I have a look at the full debug with the missing
srpt_close_sessions() messages to see if it's being called twice for the
same se_session, and the code changes against v4.12-rc3 you're testing
with that account for the ~50 lines offset..?

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html