Re: Deleting SCSI device on blocked rport

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Sep 05, 2010 at 01:07:44PM -0500, Mike Christie wrote:
> On 09/02/2010 06:05 AM, Christof Schmitt wrote:
> >Deleting a SCSI device on a rport in the state FC_PORTSTATE_BLOCKED,
> >but before the fast_io_fail_tmo expires results in a hanging kernel
> >thread:
> >
> >STACK TRACE FOR TASK: 0x2a368b38 (sysfsd)
> >
> >  STACK:
> >  0 schedule+1108 [0x5cac48]
> >  1 schedule_timeout+528 [0x5cb7fc]
> >  2 wait_for_common+266 [0x5ca6be]
> >  3 blk_execute_rq+160 [0x354054]
> >  4 scsi_execute+324 [0x3b7ef4]
> >  5 scsi_execute_req+162 [0x3b80ca]
> >  6 sd_sync_cache+138 [0x3cf662]
> >  7 sd_shutdown+138 [0x3cf91a]
> >  8 sd_remove+112 [0x3cfe4c]
> >  9 __device_release_driver+124 [0x3a08b8]
> >10 device_release_driver+60 [0x3a0a5c]
> >11 bus_remove_device+266 [0x39fa76]
> >12 device_del+340 [0x39d818]
> >13 __scsi_remove_device+204 [0x3bcc48]
> >14 scsi_remove_device+66 [0x3bcc8e]
> >15 sysfs_schedule_callback_work+50 [0x260d66]
> >16 worker_thread+622 [0x162326]
> >17 kthread+160 [0x1680b0]
> >18 kernel_thread_starter+6 [0x10aaea]
> >
> >When the fast_io_fail_tmo or dev_loss_tmo expire, this does not
> >change, so this has the potential of blocking the entire system.
> 
> Are you saying if you delete the device then one of those timers
> fires, nothing starts the queues?

Yes.

> Is it because scsi_target_unblock
> is not seeing the devices, because the scsi_remove_device has
> already removed it from the device lists?

I did some more debugging today, the problem is here:
__scsi_remove_device changes the SCSI device state to SDEV_CANCEL. If
the rport is blocked, before the fast_io_fail timer fires, the queue
is stopped. The FC transport class later calls scsi_target_unblock to
get things moving again, but scsi_internal_device_unblock does not
call blk_start_queue for the device in SDEV_CANCEL. I am preparing a
patch that scsi_internal_device_unblock also calls blk_start_queue
for devices in SDEV_CANCEL.

> >The request queue seems to be STOPPED at the moment.
> >         queue_flags = 0xa805
> >
> 
> What causes the delete? Is it userspace or a scsi_remove_device by a LLD?

The delete is called from userspace:
echo 1 > /sys/bus/scsi/devices/0\:0\:20\:1087324176/delete

sdev_store_delete schedules the actual removal on the sysfsd
workqueue, which then hangs in the stack trace above.

> It looks like if the driver does fc_remove_host it will call
> fc_rport_final_delete->fc_terminate_rport_io->scsi_target_unblock..->
> blk_start_queue which clears the queue_flags stopped bit and avoids
> the problem.
> 
> And it looks like if fc_terminate_rport_io is called by fast_io_fail
> or the dev_loss_tmo handlers that will call scsi_target_unblock too.

Yes, but scsi_target_unblock does nothing for devices in SDEV_CANCEL,
as i learned today.

> We hit something similar in iscsi, because it used to loop over
> devices in userspace and would call the device's delete sysfs attr.
> 
> 
> >I am not sure how to approach this. One idea would be that the unblock
> >in fc_terminate_rport_io should also trigger the release of the
> >pending command, but it does not seem to happen.
> >
> 
> I did the attached patch for iscsi. It starts the queue and runs it.
> You still have to wait for the transport class to move from blocked
> to online or dead/not-present, so the queuecommand chkready
> functions can fail or finish the IO.
>
> One thing I was worried about was if something stopped the queues
> and really did not want IO sent, and did not have some checks like
> how FC and iSCSI do. Maybe we want to set some state on the scsi
> device so that scsi_request_fn can check it and just fail IO
> immediately?

The patch has the same effect as the approach to also start the queues
for devices in SDEV_CANCEL in scsi_internal_device_unblock. I do not
want to add more states to the SCSI devices, so handling this in
scsi_internal_device_unblock looks better to me.

> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 9ade720..41c2625 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -67,8 +67,6 @@ static struct scsi_host_sg_pool scsi_sg_pools[] = {
> 
>  struct kmem_cache *scsi_sdb_cache;
> 
> -static void scsi_run_queue(struct request_queue *q);
> -
>  /*
>   * Function:	scsi_unprep_request()
>   *
> @@ -397,7 +395,7 @@ static inline int scsi_host_is_busy(struct Scsi_Host *shost)
>   * Notes:	The previous command was completely finished, start
>   *		a new one if possible.
>   */
> -static void scsi_run_queue(struct request_queue *q)
> +void scsi_run_queue(struct request_queue *q)
>  {
>  	struct scsi_device *sdev = q->queuedata;
>  	struct Scsi_Host *shost = sdev->host;
> diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
> index b4056d1..d041cdb 100644
> --- a/drivers/scsi/scsi_priv.h
> +++ b/drivers/scsi/scsi_priv.h
> @@ -26,6 +26,7 @@ extern int scsi_init_hosts(void);
>  extern void scsi_exit_hosts(void);
> 
>  /* scsi.c */
> +extern void scsi_run_queue(struct request_queue *q);
>  extern int scsi_dispatch_cmd(struct scsi_cmnd *cmd);
>  extern int scsi_setup_command_freelist(struct Scsi_Host *shost);
>  extern void scsi_destroy_command_freelist(struct Scsi_Host *shost);
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index c3f6737..b829ffc 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -923,6 +923,9 @@ void __scsi_remove_device(struct scsi_device *sdev)
>  		if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
>  			return;
> 
> +		blk_start_queue(sdev->request_queue);
> +		scsi_run_queue(sdev->request_queue);
> +
>  		bsg_unregister_queue(sdev->request_queue);
>  		device_unregister(&sdev->sdev_dev);
>  		transport_remove_device(dev);

--
Christof
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux