Re: [PATCH] fc transport: pre-emptively terminate i/o upon dev_loss_tmo

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Mike Christie wrote:
Mike Christie wrote:
Ralph Wuerthner wrote:
Mike Christie wrote:
James Smart wrote:
Pre-emptively terminate i/o on the rport if dev_loss_tmo has fired.
The desire is to terminate everything, so that the i/o is cleaned up
prior to the sdev's being unblocked, thus any outstanding timeouts/aborts
are avoided.

Also, we do this early enough such that the rport's port_id field is
still valid. FCOE libFC code needs this info to find the i/o's to
terminate.

-- james s

 Signed-off-by: James Smart <james.smart@xxxxxxxxxx>

 ---

 scsi_transport_fc.c |   11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)


diff -upNr a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c --- a/drivers/scsi/scsi_transport_fc.c 2008-10-18 10:32:52.000000000 -0400 +++ b/drivers/scsi/scsi_transport_fc.c 2008-12-05 12:13:54.000000000 -0500
@@ -3012,6 +3012,16 @@ fc_timeout_deleted_rport(struct work_str
     rport->port_state = FC_PORTSTATE_NOTPRESENT;
     rport->flags &= ~FC_RPORT_FAST_FAIL_TIMEDOUT;
+ /*
+     * Pre-emptively kill I/O rather than waiting for the work queue
+     * item to teardown the starget. (FCOE libFC folks prefer this
+     * and to have the rport_port_id still set when it's done).
+     */
+    spin_unlock_irqrestore(shost->host_lock, flags);
+    fc_terminate_rport_io(rport);
+
+    BUG_ON(rport->port_state != FC_PORTSTATE_NOTPRESENT);
+
I think we could this this bug_on or we might hit a problem below where this thread is changing the port_id but some other thread is calling fc_remote_port_add and changging the port id.

I think the problem is that fc_remote_port_add only calls fc_flush_work which flushes the fc_host work_q:


struct fc_rport *
() (struct Scsi_Host *shost, int channel,
         struct fc_rport_identifiers  *ids)
{
         struct fc_internal *fci = to_fc_internal(shost->transportt);
         struct fc_host_attrs *fc_host = shost_to_fc_host(shost);
         struct fc_rport *rport;
         unsigned long flags;
         int match = 0;

         /* ensure any stgt delete functions are done */
         fc_flush_work(shost);


But fc_timeout_deleted_rport dev_loss_work is running on the devloss_work_q.

So if fc_timeout_deleted_rport grabs the host lock first, it will set the port state to FC_PORTSTATE_NOTPRESENT, then drop the lock.

Once fc_timeout_delete_rport drops the lock, fc_remote_port_add could have already passed the fc_flush_work() call because there was nothing on it. fc_remote_port_add would then drop down to the list search on the bindings array and see the remote port. It would then start setting the port state, id and port_name and node_name, but at the same time, because the host lock no longer guards it, fc_timedout_deleted_rport could be fiddling with the same fields and we could end up a mix of values or it could be running over the BUG_ON.

Is that right?

If so do we just want to flush the devloss_work_queue in fc_remote_port_add?



/* remove the identifiers that aren't used in the consisting binding */
     switch (fc_host->tgtid_bind_type) {
     case FC_TGTID_BIND_BY_WWPN:
@@ -3035,7 +3045,6 @@ fc_timeout_deleted_rport(struct work_str
      * went away and didn't come back - we'll remove
      * all attached scsi devices.
      */
-    spin_unlock_irqrestore(shost->host_lock, flags);
scsi_target_unblock(&rport->dev);
     fc_queue_work(shost, &rport->stgt_delete_work);


One of our tester hit the problem Michael describes above: right after
releasing the shost->host_lock lock in fc_timeout_deleted_rport() the
remote port was rediscovered by the LLDD (in our case zfcp),
fc_remote_port_add() was called and BUG_ON() triggered.

But flushing the devloss_work_queue in fc_remote_port_add() won't help
either because it still would be possible that right after the
flush_workqueue() call a timer could trigger and fc_timeout_deleted_rport()


What timer could that be? Only the dev_loss_work delayed work calls fc_timeout_deleted_rport.

Are you thinking about fc_rport_final_delete?

If you do the
                if (!cancel_delayed_work(&rport->dev_loss_work))
                        fc_flush_devloss(shost);

sequence we should be ok I think, because after that point it should have either been canceled or the work has completed.


I did this in the attached patch. It also will prevent drivers from having to worry about about if the terminate rport IO or dev loss callback is running at the same time fc_remote_port_add is run.


I forgot to cancel the fail io work too. I guess we can also remove the same calls later in that function.
diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
index a152f89..e7bbc65 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -2564,6 +2564,14 @@ fc_remote_port_add(struct Scsi_Host *shost, int channel,
 	/* ensure any stgt delete functions are done */
 	fc_flush_work(shost);
 
+	/* ensure fail fast is done */
+	if (!cancel_delayed_work(&rport->fail_io_work))
+		fc_flush_devloss(shost);
+
+	/* ensure dev loss is done */
+	if (!cancel_delayed_work(&rport->dev_loss_work))
+		fc_flush_devloss(shost);
+
 	/*
 	 * Search the list of "active" rports, for an rport that has been
 	 * deleted, but we've held off the real delete while the target
@@ -2630,16 +2638,6 @@ fc_remote_port_add(struct Scsi_Host *shost, int channel,
 				    (!(ids->roles & FC_PORT_ROLE_FCP_TARGET)))
 					return rport;
 
-				/*
-				 * Stop the fail io and dev_loss timers.
-				 * If they flush, the port_state will
-				 * be checked and will NOOP the function.
-				 */
-				if (!cancel_delayed_work(&rport->fail_io_work))
-					fc_flush_devloss(shost);
-				if (!cancel_delayed_work(&rport->dev_loss_work))
-					fc_flush_devloss(shost);
-
 				spin_lock_irqsave(shost->host_lock, flags);
 
 				rport->flags &= ~(FC_RPORT_FAST_FAIL_TIMEDOUT |

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux