Re: SCSI layer RPM deadlock debug suggestion

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




And that looks to solve the deadlock which I was seeing. I'm not sure on
side-effects elsewhere.

We'll test it a bit more.

In the absence of any bad reports, here is a proposal for a patch.

Comments?

Alan Stern

Hi Alan,

Sorry for not getting back to you sooner. Testing so far with the originally proposed change [0] has not raised any issues and has solved the deadlock.

But we have a list of other problems to deal with in the RPM area related to the LLDD/libsas, so were waiting to address all of them (or at least have a plan) before progressing this change.

One such issue is that when we issue the link-reset which causes the device to be lost in the test, the disk is not found again. The customer may not be happy with this, so we're investigating solutions.

As for your change itself, I had something similar sitting on our dev branch:

[0] https://github.com/hisilicon/kernel-dev/commit/3696ca85c1e00257c96e40154d28b936742430c4

For me, I'm happy to hold off on any change, but if you think it's serious enough to progress your patch, below, now, then I think that should be ok.

Thanks,
John




John Garry reported a deadlock that occurs when trying to access a
runtime-suspended SATA device.  For obscure reasons, the rescan
procedure causes the link to be hard-reset, which disconnects the
device.

The rescan tries to carry out a runtime resume when accessing the
device.  scsi_rescan_device() holds the SCSI device lock and won't
release it until it can put commands onto the device's block queue.
This can't happen until the queue is successfully runtime-resumed or
the device is unregistered.  But the runtime resume fails because the
device is disconnected, and __scsi_remove_device() can't do the
unregistration because it can't get the device lock.

The best way to resolve this deadlock appears to be to allow the block
queue to start running again even after an unsuccessful runtime
resume.  The idea is that the driver or the SCSI error handler will
need to be able to use the queue to resolve the runtime resume
failure.

This patch removes the err argument to blk_post_runtime_resume() and
makes the routine act as though the resume was successful always.
This fixes the deadlock.

Reported-and-tested-by: John Garry <john.garry@xxxxxxxxxx>
Signed-off-by: Alan Stern <stern@xxxxxxxxxxxxxxxxxxx>
Fixes: e27829dc92e5 ("scsi: serialize ->rescan against ->remove")
CC: Bart Van Assche <bvanassche@xxxxxxx>
CC: Hannes Reinecke <hare@xxxxxxx>
CC: <stable@xxxxxxxxxxxxxxx>

---

  block/blk-pm.c         |   22 +++++++---------------
  drivers/scsi/scsi_pm.c |    2 +-
  include/linux/blk-pm.h |    2 +-
  3 files changed, 9 insertions(+), 17 deletions(-)

Index: usb-devel/block/blk-pm.c
===================================================================
--- usb-devel.orig/block/blk-pm.c
+++ usb-devel/block/blk-pm.c
@@ -163,27 +163,19 @@ EXPORT_SYMBOL(blk_pre_runtime_resume);
  /**
   * blk_post_runtime_resume - Post runtime resume processing
   * @q: the queue of the device
- * @err: return value of the device's runtime_resume function
   *
   * Description:
- *    Update the queue's runtime status according to the return value of the
- *    device's runtime_resume function. If the resume was successful, call
- *    blk_set_runtime_active() to do the real work of restarting the queue.
+ *    For historical reasons, this routine merely calls blk_set_runtime_active()
+ *    to do the real work of restarting the queue.  It does this regardless of
+ *    whether the device's runtime-resume succeeded; even if it failed the
+ *    driver or error handler will need to communicate with the device.
   *
   *    This function should be called near the end of the device's
   *    runtime_resume callback.
   */
-void blk_post_runtime_resume(struct request_queue *q, int err)
+void blk_post_runtime_resume(struct request_queue *q)
  {
-	if (!q->dev)
-		return;
-	if (!err) {
-		blk_set_runtime_active(q);
-	} else {
-		spin_lock_irq(&q->queue_lock);
-		q->rpm_status = RPM_SUSPENDED;
-		spin_unlock_irq(&q->queue_lock);
-	}
+	blk_set_runtime_active(q);
  }
  EXPORT_SYMBOL(blk_post_runtime_resume);
@@ -201,7 +193,7 @@ EXPORT_SYMBOL(blk_post_runtime_resume);
   * runtime PM status and re-enable peeking requests from the queue. It
   * should be called before first request is added to the queue.
   *
- * This function is also called by blk_post_runtime_resume() for successful
+ * This function is also called by blk_post_runtime_resume() for
   * runtime resumes.  It does everything necessary to restart the queue.
   */
  void blk_set_runtime_active(struct request_queue *q)
Index: usb-devel/drivers/scsi/scsi_pm.c
===================================================================
--- usb-devel.orig/drivers/scsi/scsi_pm.c
+++ usb-devel/drivers/scsi/scsi_pm.c
@@ -262,7 +262,7 @@ static int sdev_runtime_resume(struct de
  	blk_pre_runtime_resume(sdev->request_queue);
  	if (pm && pm->runtime_resume)
  		err = pm->runtime_resume(dev);
-	blk_post_runtime_resume(sdev->request_queue, err);
+	blk_post_runtime_resume(sdev->request_queue);
return err;
  }
Index: usb-devel/include/linux/blk-pm.h
===================================================================
--- usb-devel.orig/include/linux/blk-pm.h
+++ usb-devel/include/linux/blk-pm.h
@@ -14,7 +14,7 @@ extern void blk_pm_runtime_init(struct r
  extern int blk_pre_runtime_suspend(struct request_queue *q);
  extern void blk_post_runtime_suspend(struct request_queue *q, int err);
  extern void blk_pre_runtime_resume(struct request_queue *q);
-extern void blk_post_runtime_resume(struct request_queue *q, int err);
+extern void blk_post_runtime_resume(struct request_queue *q);
  extern void blk_set_runtime_active(struct request_queue *q);
  #else
  static inline void blk_pm_runtime_init(struct request_queue *q,
.





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux