Re: Deleting SCSI device on blocked rport

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 09/02/2010 06:05 AM, Christof Schmitt wrote:
Deleting a SCSI device on a rport in the state FC_PORTSTATE_BLOCKED,
but before the fast_io_fail_tmo expires results in a hanging kernel
thread:

STACK TRACE FOR TASK: 0x2a368b38 (sysfsd)

  STACK:
  0 schedule+1108 [0x5cac48]
  1 schedule_timeout+528 [0x5cb7fc]
  2 wait_for_common+266 [0x5ca6be]
  3 blk_execute_rq+160 [0x354054]
  4 scsi_execute+324 [0x3b7ef4]
  5 scsi_execute_req+162 [0x3b80ca]
  6 sd_sync_cache+138 [0x3cf662]
  7 sd_shutdown+138 [0x3cf91a]
  8 sd_remove+112 [0x3cfe4c]
  9 __device_release_driver+124 [0x3a08b8]
10 device_release_driver+60 [0x3a0a5c]
11 bus_remove_device+266 [0x39fa76]
12 device_del+340 [0x39d818]
13 __scsi_remove_device+204 [0x3bcc48]
14 scsi_remove_device+66 [0x3bcc8e]
15 sysfs_schedule_callback_work+50 [0x260d66]
16 worker_thread+622 [0x162326]
17 kthread+160 [0x1680b0]
18 kernel_thread_starter+6 [0x10aaea]

When the fast_io_fail_tmo or dev_loss_tmo expire, this does not
change, so this has the potential of blocking the entire system.

Are you saying if you delete the device then one of those timers fires, nothing starts the queues? Is it because scsi_target_unblock is not seeing the devices, because the scsi_remove_device has already removed it from the device lists?


The request queue seems to be STOPPED at the moment.
         queue_flags = 0xa805


What causes the delete? Is it userspace or a scsi_remove_device by a LLD?

It looks like if the driver does fc_remove_host it will call fc_rport_final_delete->fc_terminate_rport_io->scsi_target_unblock..-> blk_start_queue which clears the queue_flags stopped bit and avoids the problem.

And it looks like if fc_terminate_rport_io is called by fast_io_fail or the dev_loss_tmo handlers that will call scsi_target_unblock too.

We hit something similar in iscsi, because it used to loop over devices in userspace and would call the device's delete sysfs attr.


I am not sure how to approach this. One idea would be that the unblock
in fc_terminate_rport_io should also trigger the release of the
pending command, but it does not seem to happen.


I did the attached patch for iscsi. It starts the queue and runs it. You still have to wait for the transport class to move from blocked to online or dead/not-present, so the queuecommand chkready functions can fail or finish the IO.

One thing I was worried about was if something stopped the queues and really did not want IO sent, and did not have some checks like how FC and iSCSI do. Maybe we want to set some state on the scsi device so that scsi_request_fn can check it and just fail IO immediately?
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 9ade720..41c2625 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -67,8 +67,6 @@ static struct scsi_host_sg_pool scsi_sg_pools[] = {
 
 struct kmem_cache *scsi_sdb_cache;
 
-static void scsi_run_queue(struct request_queue *q);
-
 /*
  * Function:	scsi_unprep_request()
  *
@@ -397,7 +395,7 @@ static inline int scsi_host_is_busy(struct Scsi_Host *shost)
  * Notes:	The previous command was completely finished, start
  *		a new one if possible.
  */
-static void scsi_run_queue(struct request_queue *q)
+void scsi_run_queue(struct request_queue *q)
 {
 	struct scsi_device *sdev = q->queuedata;
 	struct Scsi_Host *shost = sdev->host;
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index b4056d1..d041cdb 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -26,6 +26,7 @@ extern int scsi_init_hosts(void);
 extern void scsi_exit_hosts(void);
 
 /* scsi.c */
+extern void scsi_run_queue(struct request_queue *q);
 extern int scsi_dispatch_cmd(struct scsi_cmnd *cmd);
 extern int scsi_setup_command_freelist(struct Scsi_Host *shost);
 extern void scsi_destroy_command_freelist(struct Scsi_Host *shost);
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index c3f6737..b829ffc 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -923,6 +923,9 @@ void __scsi_remove_device(struct scsi_device *sdev)
 		if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
 			return;
 
+		blk_start_queue(sdev->request_queue);
+		scsi_run_queue(sdev->request_queue);
+
 		bsg_unregister_queue(sdev->request_queue);
 		device_unregister(&sdev->sdev_dev);
 		transport_remove_device(dev);

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux