Re: [PATCH v1 1/1] scsi: Synchronize request queue PM status only on successful resume

Stanley Chu <stanley.chu@xxxxxxxxxxxx> · Thu, 3 Jan 2019 14:38:47 +0800

Hi Bart,

On Wed, 2019-01-02 at 08:15 -0800, Bart Van Assche wrote:
> On Wed, 2019-01-02 at 14:25 +0800, stanley.chu@xxxxxxxxxxxx wrote:
> > From: Stanley Chu <stanley.chu@xxxxxxxxxxxx>
> > 
> > The commit 356fd2663cff ("scsi: Set request queue runtime PM status
> > back to active on resume") fixed up the inconsistent RPM status between
> > request queue and device. However changing request queue RPM status
> > shall be done only on successful resume, otherwise status may be still
> > inconsistent as below,
> > 
> > Request queue: RPM_ACTIVE
> > Device: RPM_SUSPENDED
> > 
> > This ends up soft lockup because requests can be submitted to
> > underlying devices but those devices and their required resource
> > are not resumed.
> > 
> > Signed-off-by: Stanley Chu <stanley.chu@xxxxxxxxxxxx>
> 
> Please add "Fixes:" and "Cc: stable" tags and also Cc the author of commit
> 356fd2663cff.

Sure. Thanks for remind.

> 
> 
> > ---
> >  drivers/scsi/scsi_pm.c | 24 ++++++++++++++----------
> >  1 file changed, 14 insertions(+), 10 deletions(-)
> > 
> > diff --git a/drivers/scsi/scsi_pm.c b/drivers/scsi/scsi_pm.c
> > index a2b4179..eff3e59 100644
> > --- a/drivers/scsi/scsi_pm.c
> > +++ b/drivers/scsi/scsi_pm.c
> > @@ -82,6 +82,20 @@ static int scsi_dev_type_resume(struct device *dev,
> >  		pm_runtime_disable(dev);
> >  		pm_runtime_set_active(dev);
> >  		pm_runtime_enable(dev);
> > +
> > +		/*
> > +		 * Forcibly set runtime PM status of request queue to "active"
> > +		 * to make sure we can again get requests from the queue
> > +		 * (see also blk_pm_peek_request()).
> > +		 *
> > +		 * The resume hook will correct runtime PM status of the disk.
> > +		 */
> > +		if (!err && scsi_is_sdev_device(dev)) {
> > +			struct scsi_device *sdev = to_scsi_device(dev);
> > +
> > +			if (sdev->request_queue->dev)
> > +				blk_set_runtime_active(sdev->request_queue);
> > +		}
> 
> What makes you think that the sdev->request_queue->dev test is necessary? The
> scsi_dev_type_resume() function is only called after blk_pm_runtime_init() has
> finished so I don't think that test is necessary.

We found NULL sdev->request_queue->dev may be dereferenced during below
system resume flow,

scsi_bus_resume_common()
 => async_schedule_domain(async_sdev_resume)

And then async_sdev_resume() is invoked asynchronously,

async_sdev_resume()
 => scsi_dev_type_resume(dev, do_scsi_resume)
  => blk_set_runtime_active(sdev->request_queue)

If a SCSI device does not have upper layer driver (like SCSI disk), it
may not be applied blk_pm_runtime_init() invoked by sd_probe() while
this SCSI device is added.

For example, some SCSI devices (like UFS Boot W-LUN) are added
explicitly in __scsi_add_device() by ufshcd_scsi_add_wlus() first and
thus sd_probe() for them is skipped because they are already visible.

For those SCSI devices, null sdev->request_queue->dev will be
dereferenced in blk_set_runtime_active() during above system resume
flow, therefore we add a null pointer checking for this case.

The same issue also happens on those SCSI devices before this patch as
below system resume flow while devices are already runtime-suspended.

scsi_bus_resume_common()
 => blk_set_runtime_active(to_scsi_device(dev)->request_queue)

> 
> Additionally, since the above code occurs inside a block controlled by an
> "if (err == 0)" statement, I think the !err test is redundant and should be
> left out.

Sorry this is my code merge defect.
"err" here shall be returned value from pm_runtime_set_active().

I will fix it in v2.

> 
> Thanks,
> 
> Bart.
> 
> _______________________________________________
> Linux-mediatek mailing list
> Linux-mediatek@xxxxxxxxxxxxxxxxxxx
> http://lists.infradead.org/mailman/listinfo/linux-mediatek