Re: [PATCH 1/2] iser-target: Fix multi network portal shutdown regression

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2014-06-05 at 12:28 +0300, Sagi Grimberg wrote:
> On 6/4/2014 4:17 AM, Nicholas A. Bellinger wrote:
> > From: Nicholas Bellinger <nab@xxxxxxxxxxxxxxx>
> >
> > This patch fixes a iser-target specific regression introduced in
> > v3.15-rc6 with:
> >
> > commit 14f4b54fe38f3a8f8392a50b951c8aa43b63687a
> > Author: Sagi Grimberg <sagig@xxxxxxxxxxxx>
> > Date:   Tue Apr 29 13:13:47 2014 +0300
> >
> >      Target/iscsi,iser: Avoid accepting transport connections during stop stage
> >
> > where the change to set iscsi_np->enabled = false within
> > iscsit_clear_tpg_np_login_thread() meant that a iscsi_np with
> > two iscsi_tpg_np exports would have it's parent iscsi_np set
> > to a disabled state, even if other iscsi_tpg_np exports still
> > existed.
> >
> > This patch changes iscsit_clear_tpg_np_login_thread() to only
> > set iscsi_np->enabled = false when shutdown = true, and also
> > changes iscsit_del_np() to set iscsi_np->enabled = true when
> > iscsi_np->np_exports is non zero.
> 
> Hey Nic,
> 
> Can you please elaborate on what is the phenomenon and how you were able 
> to reproduce it?
> I ran several long runs with unstable ports and target stack shutdowns 
> over 10 targets each exposing 4 NPs.
> The code seemed to survive it. I would like to extend our QA test-suite 
> to hit this bug.

Easy.  Create multiple targets that share the same network portal, then
remove one of the network portals from one of the targets.

>From there, all logins on the targets with the network portal in
question will be rejected because np->enabled = false.

> 
> > Cc: Sagi Grimberg <sagig@xxxxxxxxxxxxxxxxxx>
> > Cc: stable@xxxxxxxxxxxxxxx # 3.10+
> > Signed-off-by: Nicholas Bellinger <nab@xxxxxxxxxxxxxxx>
> > ---
> >   drivers/target/iscsi/iscsi_target.c     |    1 +
> >   drivers/target/iscsi/iscsi_target_tpg.c |    3 ++-
> >   2 files changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
> > index 46588c8..9189bc0 100644
> > --- a/drivers/target/iscsi/iscsi_target.c
> > +++ b/drivers/target/iscsi/iscsi_target.c
> > @@ -460,6 +460,7 @@ int iscsit_del_np(struct iscsi_np *np)
> >   	spin_lock_bh(&np->np_thread_lock);
> >   	np->np_exports--;
> >   	if (np->np_exports) {
> > +		np->enabled = true;
> 
> Shouldn't this be already true?

No, that is the regression.

The original patch did not take into account that a single iscsi_np
instance can be shared across multiple iscsi_tpg_nps, so when one
iscsi_tpg_np was shutdown iscsi_np->enabled = false was set, and never
reset to iscsi_np->enabled = true, causing all other targets that where
still using the same iscsi_np to be unable to accept logins.

> 
> >   		spin_unlock_bh(&np->np_thread_lock);
> >   		return 0;
> >   	}
> > diff --git a/drivers/target/iscsi/iscsi_target_tpg.c b/drivers/target/iscsi/iscsi_target_tpg.c
> > index ca18118..1431e84 100644
> > --- a/drivers/target/iscsi/iscsi_target_tpg.c
> > +++ b/drivers/target/iscsi/iscsi_target_tpg.c
> > @@ -184,7 +184,8 @@ static void iscsit_clear_tpg_np_login_thread(
> >   		return;
> >   	}
> >   
> > -	tpg_np->tpg_np->enabled = false;
> > +	if (shutdown)
> > +		tpg_np->tpg_np->enabled = false;
> >   	iscsit_reset_np_thread(tpg_np->tpg_np, tpg_np, tpg, shutdown);
> >   }
> >   
> 
> The enabled boolean was made for the transport (iser) to understand if 
> it can accept transport connection requests (cma).
> In case the np_thread is during reset stage, shouldn't iser reject 
> connect requests?

There are two scenarios here.

The first is where iscsit_clear_tpg_np_login_thread() is called with
shutdown=false from iscsit_tpg_disable_portal_group() context, where the
TPG state is explicitly set to TPG_STATE_INACTIVE before hand, causing
any new logins to fail with error status in iscsi_target_locate_portal()
-> iscsit_get_tpg_from_np().  So not setting iscsi_np->enabled = false
here is OK because the network portal itself is not being shutdown, just
the target endpoint.

The second is where iscsit_clear_tpg_np_login_thread() with
shutdown=true is called from iscsit_tpg_release_np(), because the actual
iscsi_tpg_np group is being removed from configfs.

It's the latter case where disallowing logins to the parent iscsi_np
makes sense until the individual iscsi_tpg_np is released, but if other
iscsi_tpg_np are still using the iscsi_np, then ->enabled = true needs
to be reset in order for logins on other iscsi_tpg_nps to be processed.

> 
> I have yet to test this code, have you tried running with a lot of 
> target instances?

Yes, using 32 targets and 32 network portals.

> In this case it's more likely that during the np reset stage the 
> initiator will retry to connect (CM connect requests to np in reset stage).
> 

Agreed.

--nab

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux