On 2020-04-29 22:40, Can Guo wrote: > On 2020-04-30 13:08, Bart Van Assche wrote: >> On 2020-04-29 21:10, Can Guo wrote: >>> During system resume, scsi_resume_device() decreases a request queue's >>> pm_only counter if the scsi device was quiesced before. But after that, >>> if the scsi device's RPM status is RPM_SUSPENDED, the pm_only counter is >>> still held (non-zero). Current scsi resume hook only sets the RPM status >>> of the scsi device and its request queue to RPM_ACTIVE, but leaves the >>> pm_only counter unchanged. This may make the request queue's pm_only >>> counter remain non-zero after resume hook returns, hence those who are >>> waiting on the mq_freeze_wq would never be woken up. Fix this by calling >>> blk_post_runtime_resume() if pm_only is non-zero to balance the pm_only >>> counter which is held by the scsi device's RPM ops. >> >> How was this issue discovered? How has this patch been tested? > > As the issue was found after system resumes, so the issue was discovered > during system suspend/resume test, and it is very easy to be replicated. > After system resumes, if this issue hits some scsi devices, all bios sent > to their request queues are blocked, which may cause a system hang if the > scsi devices are vital to system functionality. > > To make sure the patch work well, we have tested system suspend/resume > and made sure no system hang happen due to request queues got blocked > by imbalanced pm_only counter. Thanks, that's very interesting information. My concern with this patch is that the power management code is not the only caller of blk_set_pm_only() / blk_clear_pm_only(). E.g. the SCSI SPI code also calls scsi_device_quiesce() and scsi_device_resume(). These last functions call blk_set_pm_only() and blk_clear_pm_only(). More calls of scsi_device_quiesce() and scsi_device_resume() might be added in the future. Has it been considered to test directly whether a SCSI device has been runtime suspended instead of relying on blk_queue_pm_only()? How about using pm_runtime_status_suspended() or adding a function in block/blk-pm.h that checks whether q->rpm_status == RPM_SUSPENDED? Thanks, Bart.