[PATCH v2][SCSI] scsi_dh: propagate SCSI device deletion

Mike Snitzer <snitzer@xxxxxxxxxx> · Thu, 16 Dec 2010 10:29:31 -0500

On Thu, Dec 16 2010 at  9:21am -0500,
Menny_Hamburger@xxxxxxxx <Menny_Hamburger@xxxxxxxx> wrote:

> Mike,
> 
> What I meant is that removing the dm-mpath change from the patch makes
> the operation path go into an area that I did not test (the default
> case). 
> I know it's only a !hw_handler_name and a printk (as I said before -
> this might sound superstitious), however I did my tests with the
> dm-mpath part, and I prefer to send a 100% tested patch then to add
> something I did not test.
> My suggestion was to supply a scsi_dh only patch that goes through the
> same operation path as the one I tested - via the SCSI_DH_NOSYS case
> in pg_init_done. In addition, the separation of offline from
> cancel/del also seems the correct way to do this.

I see, my misunderstanding.

- But you're still returning SCSI_DH_DEV_OFFLINED; so if dm mpath's
  pg_init_done isn't updated to explicitly handle it it'll just
  fail_path() via the default case.  Are you OK with that?

- In your initial patch you said: "When running an upstream kernel,
  the above scenario may not occur because the request queue is aborted
  when the multipath fails the path."

  I'm missing _why_ fail_path's call to blk_abort_queue() would obviate
  the need for this patch.  blk_abort_queue() is only called from
  fail_path().  But I thought the problem is fail_path() isn't called in
  this case where the device is SCSI_DH_NOSYS or SCSI_DH_DEV_OFFLINED?
  -- due to the missing call to activate_complete callback (pg_init_done)
  (again, I'm very interested in this because we'll be reverting DM
   mpath's call to blk_abort_queue in the near future). 

But this would be the revised scsi_dh patch (I'm not sending to
linux-scsi until I have an answer for the above concerns):


From: Menny Hamburger <Menny_Hamburger@xxxxxxxx>

When the scsi_dh_activate returns SCSI_DH_NOSYS the activate_complete
callback is not called and the error is not propagated to DM mpath.

When a SCSI device attached to a device handler is deleted, userland
processes currently performing I/O on the device will have their I/O
hang forever.  

- Set SCSI_DH_NOSYS error when the handler is in the process of being
  deleted (e.g. the SCSI device is in a SDEV_CANCEL or SDEV_DEL state).

- Set SCSI_DH_DEV_OFFLINED error when device is in SDEV_OFFLINE state.

- Call the activate_complete callback function directly from
  scsi_dh_activate if an error has been set (when either the scsi_dh
  internal data has already been deleted or is in the process of being
  deleted).

The patch was tested in an ISCSI environment, RDAC H/W handler and
multipath.
In the following reproduction process, dd will I/O hang forever and the
only way to release it will be to reboot the machine:
1) Perform I/O on a multipath device:
    dd if=/dev/dm-0 of=/dev/zero bs=8k count=1000000 &
2) Delete all slave SCSI devices contained in the mpath device:
   I)  In an ISCSI environment, the easiest way to do this is by
   stopping ISCSI:
       /etc/init.d/iscsi stop
   II) Another way to delete the devices is by applying the following
   bash scriptlet:
       dm_devs=$(ls /sys/block/ | grep dm- | xargs)
       for dm_dev in $dm_devs; do
         devices=$(ls /sys/block/$dm_dev/slaves)
         for device in $devices; do
            echo 1 > /sys/block/$device/device/delete
         done
       done

NOTE: when DM mpath uses blk_abort_queue this scsi_dh change isn't
required.  However, DM mpath's call to blk_abort_queue has proven to be
unsafe due to a race (between blk_abort_queue and scsi_request_fn) that
can lead to list corruption.  Therefore we cannot rely on
blk_abort_queue (dm mpath's blk_abort_queue call will be reverted and
will only be restored once the race with scsi_request_fn is fixed).

Signed-off-by: Menny Hamburger <Menny_Hamburger@xxxxxxxx>
Signed-off-by: Mike Snitzer <snitzer@xxxxxxxxxx>
---
 drivers/scsi/device_handler/scsi_dh.c |   11 +++++++++--
 1 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/device_handler/scsi_dh.c b/drivers/scsi/device_handler/scsi_dh.c
index 6fae3d2..b0c56f6 100644
--- a/drivers/scsi/device_handler/scsi_dh.c
+++ b/drivers/scsi/device_handler/scsi_dh.c
@@ -442,12 +442,19 @@ int scsi_dh_activate(struct request_queue *q, activate_complete fn, void *data)
 	sdev = q->queuedata;
 	if (sdev && sdev->scsi_dh_data)
 		scsi_dh = sdev->scsi_dh_data->scsi_dh;
-	if (!scsi_dh || !get_device(&sdev->sdev_gendev))
+	if (!scsi_dh || !get_device(&sdev->sdev_gendev) ||
+	    sdev->sdev_state == SDEV_CANCEL ||
+	    sdev->sdev_state == SDEV_DEL)
 		err = SCSI_DH_NOSYS;
+	if (sdev->sdev_state == SDEV_OFFLINE)
+		err = SCSI_DH_DEV_OFFLINED;
 	spin_unlock_irqrestore(q->queue_lock, flags);
 
-	if (err)
+	if (err) {
+		if (fn)
+			fn(data, err);
 		return err;
+	}
 
 	if (scsi_dh->activate)
 		err = scsi_dh->activate(sdev, fn, data);

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel