Re: SCSI layer RPM deadlock debug suggestion

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 14, 2021 at 05:48:36PM +0100, John Garry wrote:
> > > 
> > > And that looks to solve the deadlock which I was seeing. I'm not sure on
> > > side-effects elsewhere.
> > > 
> > > We'll test it a bit more.
> > 
> > In the absence of any bad reports, here is a proposal for a patch.
> > 
> > Comments?
> > 
> > Alan Stern
> 
> Hi Alan,
> 
> Sorry for not getting back to you sooner. Testing so far with the originally
> proposed change [0] has not raised any issues and has solved the deadlock.
> 
> But we have a list of other problems to deal with in the RPM area related to
> the LLDD/libsas, so were waiting to address all of them (or at least have a
> plan) before progressing this change.
> 
> One such issue is that when we issue the link-reset which causes the device
> to be lost in the test, the disk is not found again. The customer may not be
> happy with this, so we're investigating solutions.
> 
> As for your change itself, I had something similar sitting on our dev
> branch:
> 
> [0] https://github.com/hisilicon/kernel-dev/commit/3696ca85c1e00257c96e40154d28b936742430c4
> 
> For me, I'm happy to hold off on any change, but if you think it's serious
> enough to progress your patch, below, now, then I think that should be ok.
> 
> Thanks,
> John

John:

We seem to have forgotten all about this.  I just now noticed that 
this hadn't gotten in 5.15-rc3... and the reason is that it was never 
submitted!

What would you like to do?

Alan Stern

> 
> > 
> > 
> > 
> > John Garry reported a deadlock that occurs when trying to access a
> > runtime-suspended SATA device.  For obscure reasons, the rescan
> > procedure causes the link to be hard-reset, which disconnects the
> > device.
> > 
> > The rescan tries to carry out a runtime resume when accessing the
> > device.  scsi_rescan_device() holds the SCSI device lock and won't
> > release it until it can put commands onto the device's block queue.
> > This can't happen until the queue is successfully runtime-resumed or
> > the device is unregistered.  But the runtime resume fails because the
> > device is disconnected, and __scsi_remove_device() can't do the
> > unregistration because it can't get the device lock.
> > 
> > The best way to resolve this deadlock appears to be to allow the block
> > queue to start running again even after an unsuccessful runtime
> > resume.  The idea is that the driver or the SCSI error handler will
> > need to be able to use the queue to resolve the runtime resume
> > failure.
> > 
> > This patch removes the err argument to blk_post_runtime_resume() and
> > makes the routine act as though the resume was successful always.
> > This fixes the deadlock.
> > 
> > Reported-and-tested-by: John Garry <john.garry@xxxxxxxxxx>
> > Signed-off-by: Alan Stern <stern@xxxxxxxxxxxxxxxxxxx>
> > Fixes: e27829dc92e5 ("scsi: serialize ->rescan against ->remove")
> > CC: Bart Van Assche <bvanassche@xxxxxxx>
> > CC: Hannes Reinecke <hare@xxxxxxx>
> > CC: <stable@xxxxxxxxxxxxxxx>
> > 
> > ---
> > 
> >   block/blk-pm.c         |   22 +++++++---------------
> >   drivers/scsi/scsi_pm.c |    2 +-
> >   include/linux/blk-pm.h |    2 +-
> >   3 files changed, 9 insertions(+), 17 deletions(-)
> > 
> > Index: usb-devel/block/blk-pm.c
> > ===================================================================
> > --- usb-devel.orig/block/blk-pm.c
> > +++ usb-devel/block/blk-pm.c
> > @@ -163,27 +163,19 @@ EXPORT_SYMBOL(blk_pre_runtime_resume);
> >   /**
> >    * blk_post_runtime_resume - Post runtime resume processing
> >    * @q: the queue of the device
> > - * @err: return value of the device's runtime_resume function
> >    *
> >    * Description:
> > - *    Update the queue's runtime status according to the return value of the
> > - *    device's runtime_resume function. If the resume was successful, call
> > - *    blk_set_runtime_active() to do the real work of restarting the queue.
> > + *    For historical reasons, this routine merely calls blk_set_runtime_active()
> > + *    to do the real work of restarting the queue.  It does this regardless of
> > + *    whether the device's runtime-resume succeeded; even if it failed the
> > + *    driver or error handler will need to communicate with the device.
> >    *
> >    *    This function should be called near the end of the device's
> >    *    runtime_resume callback.
> >    */
> > -void blk_post_runtime_resume(struct request_queue *q, int err)
> > +void blk_post_runtime_resume(struct request_queue *q)
> >   {
> > -	if (!q->dev)
> > -		return;
> > -	if (!err) {
> > -		blk_set_runtime_active(q);
> > -	} else {
> > -		spin_lock_irq(&q->queue_lock);
> > -		q->rpm_status = RPM_SUSPENDED;
> > -		spin_unlock_irq(&q->queue_lock);
> > -	}
> > +	blk_set_runtime_active(q);
> >   }
> >   EXPORT_SYMBOL(blk_post_runtime_resume);
> > @@ -201,7 +193,7 @@ EXPORT_SYMBOL(blk_post_runtime_resume);
> >    * runtime PM status and re-enable peeking requests from the queue. It
> >    * should be called before first request is added to the queue.
> >    *
> > - * This function is also called by blk_post_runtime_resume() for successful
> > + * This function is also called by blk_post_runtime_resume() for
> >    * runtime resumes.  It does everything necessary to restart the queue.
> >    */
> >   void blk_set_runtime_active(struct request_queue *q)
> > Index: usb-devel/drivers/scsi/scsi_pm.c
> > ===================================================================
> > --- usb-devel.orig/drivers/scsi/scsi_pm.c
> > +++ usb-devel/drivers/scsi/scsi_pm.c
> > @@ -262,7 +262,7 @@ static int sdev_runtime_resume(struct de
> >   	blk_pre_runtime_resume(sdev->request_queue);
> >   	if (pm && pm->runtime_resume)
> >   		err = pm->runtime_resume(dev);
> > -	blk_post_runtime_resume(sdev->request_queue, err);
> > +	blk_post_runtime_resume(sdev->request_queue);
> >   	return err;
> >   }
> > Index: usb-devel/include/linux/blk-pm.h
> > ===================================================================
> > --- usb-devel.orig/include/linux/blk-pm.h
> > +++ usb-devel/include/linux/blk-pm.h
> > @@ -14,7 +14,7 @@ extern void blk_pm_runtime_init(struct r
> >   extern int blk_pre_runtime_suspend(struct request_queue *q);
> >   extern void blk_post_runtime_suspend(struct request_queue *q, int err);
> >   extern void blk_pre_runtime_resume(struct request_queue *q);
> > -extern void blk_post_runtime_resume(struct request_queue *q, int err);
> > +extern void blk_post_runtime_resume(struct request_queue *q);
> >   extern void blk_set_runtime_active(struct request_queue *q);
> >   #else
> >   static inline void blk_pm_runtime_init(struct request_queue *q,
> > .
> > 
> 



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux