On Fri, Jul 13, 2018 at 08:21:09AM -0600, Jens Axboe wrote: > On 7/13/18 2:05 AM, Ming Lei wrote: > > Hi Guys, > > > > Runtime PM is usually enabled for SCSI devices, and we are switching to > > SCSI_MQ recently, but runtime PM isn't supported yet by blk-mq, and > > people may complain that. > > > > This patch tries to support runtime PM for blk-mq. And one chanllenge is > > that it can be quite expensive to account the active in-flight IOs for > > figuring out when to mark the last busy. This patch simply marks busy > > after each non-PM IO is done, and this way is workable because: > > > > 1) pm_runtime_mark_last_busy() is very cheap > > > > 2) in-flight non-PM IO is checked in blk_pre_runtime_suspend(), so > > if there is any IO queued, the device will be prevented from being > > suspened. > > > > 3) Generally speaking, autosuspend_delay_ms is often big, and should > > be in unit of second, so it shouldn't be a big deal to check if queue > > is idle in blk_pre_runtime_suspend(). > > > > > > V2: > > - re-organize code as suggested by Christoph > > - use seqlock to sync runtime PM and IO path > > See other mail on why this is not going to be acceptable. OK, I am thinking another idea for addressing this issue. We may introduce one logical admin(pm) request queue for each scsi_device, and this queue shares tag with IO queue, with NO_SCHED set, and always use atomic mode of the queue usage refcounter. Then we may send PM command to device after the IO queue is frozen. Also PREEMPT_ONLY can be removed too in this way. Even in future, all pass-through commands may be sent to this admin queue. If no one objects, I will cook patches towards this direction. Thanks, Ming