Re: [PATCH] fc transport: resolve scan vs delete deadlocks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Will this patch be integrated into scsi-misc or other tree at some point?

Thanks,
 Mike


James Smart wrote:
> In a prior posting to linux-scsi on the fc transport and workq
> deadlocks, we noted a second error that did not have a patch:
>   http://marc.theaimsgroup.com/?l=linux-scsi&m=114467847711383&w=2
>   - There's a deadlock where scsi_remove_target() has to sit behind 
>     scsi_scan_target() due to contention over the scan_lock().
> 
> Subsequently we posted a request for comments about the deadlock:
>   http://marc.theaimsgroup.com/?l=linux-scsi&m=114469358829500&w=2
> 
> This posting resolves the second error. Here's what we now understand,
> and are implementing:
> 
>   If the lldd deletes the rport while a scan is active, the sdev's queue
>   is blocked which stops the issuing of commands associated with the scan.
>   At this point, the scan stalls, and does so with the shost->scan_mutex held.
>   If, at this point, if any scan or delete request is made on the host, it
>   will stall waiting for the scan_mutex.
>   
>   For the FC transport, we queue all delete work to a single workq.
>   So, things worked fine when competing with the scan, as long as the
>   target blocking the scan was the same target at the top of our delete
>   workq, as the delete workq routine always unblocked just prior to
>   requesting the delete.  Unfortunately, if the top of our delete workq
>   was for a different target, we deadlock.  Additionally, if the target
>   blocking scan returned, we were unblocking it in the scan workq routine,
>   which really won't execute until the existing stalled scan workq
>   completes (e.g. we're re-scheduling it while it is in the midst of its
>   execution).
> 
>   This patch moves the unblock out of the workq routines and moves it to
>   the context that is scheduling the work. This ensures that at some point,
>   we will unblock the target that is blocking scan.  Please note, however,
>   that the deadlock condition may still occur while it waits for the
>   transport to timeout an unblock on a target.  Worst case, this is bounded
>   by the transport dev_loss_tmo (default: 30 seconds).
> 
> 
> Finally, Michael Reed deserves the credit for the bulk of this patch,
> analysis, and it's testing. Thank you for your help.
> 
> -- james s
> 
> Note: this patch supercedes a previous preliminary post by Michael Reed
>   http://marc.theaimsgroup.com/?l=linux-scsi&m=114675967824701&w=2
> 
> Note: The request for comments statements about the gross-ness of the
>   scan_mutex still stand.
> 
> 
> 
> Signed-off-by: Michael Reed <mdr@xxxxxxx>
> Signed-off-by: James Smart <james.smart@xxxxxxxxxx>
> 
> 
> diff -upNr a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
> --- a/drivers/scsi/scsi_transport_fc.c	2006-05-11 10:43:31.000000000 -0400
> +++ b/drivers/scsi/scsi_transport_fc.c	2006-05-11 12:04:36.000000000 -0400
> @@ -1284,7 +1284,9 @@ EXPORT_SYMBOL(fc_release_transport);
>   * @work:	Work to queue for execution.
>   *
>   * Return value:
> - * 	0 on success / != 0 for error
> + * 	1 - work queued for execution
> + *	0 - work is already queued
> + *	-EINVAL - work queue doesn't exist
>   **/
>  static int
>  fc_queue_work(struct Scsi_Host *shost, struct work_struct *work)
> @@ -1434,8 +1436,6 @@ fc_starget_delete(void *data)
>  	struct Scsi_Host *shost = rport_to_shost(rport);
>  	unsigned long flags;
>  
> -	scsi_target_unblock(&rport->dev);
> -
>  	spin_lock_irqsave(shost->host_lock, flags);
>  	if (rport->flags & FC_RPORT_DEVLOSS_PENDING) {
>  		spin_unlock_irqrestore(shost->host_lock, flags);
> @@ -1707,6 +1707,8 @@ fc_remote_port_add(struct Scsi_Host *sho
>  
>  				spin_unlock_irqrestore(shost->host_lock, flags);
>  
> +				scsi_target_unblock(&rport->dev);
> +
>  				return rport;
>  			}
>  		}
> @@ -1762,9 +1764,10 @@ fc_remote_port_add(struct Scsi_Host *sho
>  				/* initiate a scan of the target */
>  				rport->flags |= FC_RPORT_SCAN_PENDING;
>  				scsi_queue_work(shost, &rport->scan_work);
> -			}
> -
> -			spin_unlock_irqrestore(shost->host_lock, flags);
> +				spin_unlock_irqrestore(shost->host_lock, flags);
> +				scsi_target_unblock(&rport->dev);
> +			} else
> +				spin_unlock_irqrestore(shost->host_lock, flags);
>  
>  			return rport;
>  		}
> @@ -1938,6 +1941,7 @@ fc_remote_port_rolechg(struct fc_rport  
>  		rport->flags |= FC_RPORT_SCAN_PENDING;
>  		scsi_queue_work(shost, &rport->scan_work);
>  		spin_unlock_irqrestore(shost->host_lock, flags);
> +		scsi_target_unblock(&rport->dev);
>  	}
>  }
>  EXPORT_SYMBOL(fc_remote_port_rolechg);
> @@ -1970,8 +1974,9 @@ fc_timeout_deleted_rport(void  *data)
>  		dev_printk(KERN_ERR, &rport->dev,
>  			"blocked FC remote port time out: no longer"
>  			" a FCP target, removing starget\n");
> -		fc_queue_work(shost, &rport->stgt_delete_work);
>  		spin_unlock_irqrestore(shost->host_lock, flags);
> +		scsi_target_unblock(&rport->dev);
> +		fc_queue_work(shost, &rport->stgt_delete_work);
>  		return;
>  	}
>  
> @@ -2035,17 +2040,15 @@ fc_timeout_deleted_rport(void  *data)
>  	 * went away and didn't come back - we'll remove
>  	 * all attached scsi devices.
>  	 */
> -	fc_queue_work(shost, &rport->stgt_delete_work);
> -
>  	spin_unlock_irqrestore(shost->host_lock, flags);
> +
> +	scsi_target_unblock(&rport->dev);
> +	fc_queue_work(shost, &rport->stgt_delete_work);
>  }
>  
>  /**
>   * fc_scsi_scan_rport - called to perform a scsi scan on a remote port.
>   *
> - * Will unblock the target (in case it went away and has now come back),
> - * then invoke a scan.
> - *
>   * @data:	remote port to be scanned.
>   **/
>  static void
> @@ -2057,7 +2060,6 @@ fc_scsi_scan_rport(void *data)
>  
>  	if ((rport->port_state == FC_PORTSTATE_ONLINE) &&
>  	    (rport->roles & FC_RPORT_ROLE_FCP_TARGET)) {
> -		scsi_target_unblock(&rport->dev);
>  		scsi_scan_target(&rport->dev, rport->channel,
>  			rport->scsi_target_id, SCAN_WILD_CARD, 1);
>  	}
> 
> 
> -
> : send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
-
: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux