Hi Bart, On Wed, 2019-01-02 at 08:15 -0800, Bart Van Assche wrote: > On Wed, 2019-01-02 at 14:25 +0800, stanley.chu@xxxxxxxxxxxx wrote: > > From: Stanley Chu <stanley.chu@xxxxxxxxxxxx> > > > > The commit 356fd2663cff ("scsi: Set request queue runtime PM status > > back to active on resume") fixed up the inconsistent RPM status between > > request queue and device. However changing request queue RPM status > > shall be done only on successful resume, otherwise status may be still > > inconsistent as below, > > > > Request queue: RPM_ACTIVE > > Device: RPM_SUSPENDED > > > > This ends up soft lockup because requests can be submitted to > > underlying devices but those devices and their required resource > > are not resumed. > > > > Signed-off-by: Stanley Chu <stanley.chu@xxxxxxxxxxxx> > > Please add "Fixes:" and "Cc: stable" tags and also Cc the author of commit > 356fd2663cff. Sure. Thanks for remind. > > > > --- > > drivers/scsi/scsi_pm.c | 24 ++++++++++++++---------- > > 1 file changed, 14 insertions(+), 10 deletions(-) > > > > diff --git a/drivers/scsi/scsi_pm.c b/drivers/scsi/scsi_pm.c > > index a2b4179..eff3e59 100644 > > --- a/drivers/scsi/scsi_pm.c > > +++ b/drivers/scsi/scsi_pm.c > > @@ -82,6 +82,20 @@ static int scsi_dev_type_resume(struct device *dev, > > pm_runtime_disable(dev); > > pm_runtime_set_active(dev); > > pm_runtime_enable(dev); > > + > > + /* > > + * Forcibly set runtime PM status of request queue to "active" > > + * to make sure we can again get requests from the queue > > + * (see also blk_pm_peek_request()). > > + * > > + * The resume hook will correct runtime PM status of the disk. > > + */ > > + if (!err && scsi_is_sdev_device(dev)) { > > + struct scsi_device *sdev = to_scsi_device(dev); > > + > > + if (sdev->request_queue->dev) > > + blk_set_runtime_active(sdev->request_queue); > > + } > > What makes you think that the sdev->request_queue->dev test is necessary? The > scsi_dev_type_resume() function is only called after blk_pm_runtime_init() has > finished so I don't think that test is necessary. We found NULL sdev->request_queue->dev may be dereferenced during below system resume flow, scsi_bus_resume_common() => async_schedule_domain(async_sdev_resume) And then async_sdev_resume() is invoked asynchronously, async_sdev_resume() => scsi_dev_type_resume(dev, do_scsi_resume) => blk_set_runtime_active(sdev->request_queue) If a SCSI device does not have upper layer driver (like SCSI disk), it may not be applied blk_pm_runtime_init() invoked by sd_probe() while this SCSI device is added. For example, some SCSI devices (like UFS Boot W-LUN) are added explicitly in __scsi_add_device() by ufshcd_scsi_add_wlus() first and thus sd_probe() for them is skipped because they are already visible. For those SCSI devices, null sdev->request_queue->dev will be dereferenced in blk_set_runtime_active() during above system resume flow, therefore we add a null pointer checking for this case. The same issue also happens on those SCSI devices before this patch as below system resume flow while devices are already runtime-suspended. scsi_bus_resume_common() => blk_set_runtime_active(to_scsi_device(dev)->request_queue) > > Additionally, since the above code occurs inside a block controlled by an > "if (err == 0)" statement, I think the !err test is redundant and should be > left out. Sorry this is my code merge defect. "err" here shall be returned value from pm_runtime_set_active(). I will fix it in v2. > > Thanks, > > Bart. > > _______________________________________________ > Linux-mediatek mailing list > Linux-mediatek@xxxxxxxxxxxxxxxxxxx > http://lists.infradead.org/mailman/listinfo/linux-mediatek