On Thu, Jul 12, 2018 at 10:00 PM, Jens Axboe <axboe@xxxxxxxxx> wrote: > On 7/12/18 6:28 AM, Ming Lei wrote: >> On Thu, Jul 12, 2018 at 05:58:28PM +0800, Ming Lei wrote: >>> On Thu, Jul 12, 2018 at 12:29:05AM +0800, Ming Lei wrote: >>>> This patch introduces blk_mq_pm_add_request() which is called after >>>> allocating one request. Also blk_mq_pm_put_request() is introduced >>>> and called after one request is freed. >>>> >>>> For blk-mq, it can be quite expensive to accounting in-flight IOs, >>>> so this patch calls pm_runtime_mark_last_busy() simply after each IO >>>> is done, instead of doing that only after the last in-flight IO is done. >>>> This way is still workable, since the active non-PM IO will be checked >>>> in blk_pre_runtime_suspend(), and runtime suspend will be prevented >>>> if there is any active non-PM IO. >>>> >>>> Also makes blk_post_runtime_resume() to cover blk-mq. >>>> >>>> Cc: "Rafael J. Wysocki" <rjw@xxxxxxxxxxxxx> >>>> Cc: Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> >>>> Cc: linux-pm@xxxxxxxxxxxxxxx >>>> Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> >>>> Cc: Christoph Hellwig <hch@xxxxxx> >>>> Cc: Bart Van Assche <bart.vanassche@xxxxxxx> >>>> Cc: "James E.J. Bottomley" <jejb@xxxxxxxxxxxxxxxxxx> >>>> Cc: "Martin K. Petersen" <martin.petersen@xxxxxxxxxx> >>>> Cc: linux-scsi@xxxxxxxxxxxxxxx >>>> Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx> >>>> --- >>>> block/blk-core.c | 12 ++++++++++-- >>>> block/blk-mq.c | 24 ++++++++++++++++++++++++ >>>> 2 files changed, 34 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/block/blk-core.c b/block/blk-core.c >>>> index c4b57d8806fe..bf66d561980d 100644 >>>> --- a/block/blk-core.c >>>> +++ b/block/blk-core.c >>>> @@ -3804,12 +3804,17 @@ EXPORT_SYMBOL(blk_pm_runtime_init); >>>> int blk_pre_runtime_suspend(struct request_queue *q) >>>> { >>>> int ret = 0; >>>> + bool active; >>>> >>>> if (!q->dev) >>>> return ret; >>>> >>>> spin_lock_irq(q->queue_lock); >>>> - if (q->nr_pending) { >>>> + if (!q->mq_ops) >>>> + active = !!q->nr_pending; >>>> + else >>>> + active = !blk_mq_pm_queue_idle(q); >>>> + if (active) { >>>> ret = -EBUSY; >>>> pm_runtime_mark_last_busy(q->dev); >>>> } else { >>> >>> Looks there is one big issue, one new IO may come just after reading >>> 'active' and before writing RPM_SUSPENDING to q->rpm_status, and both >>> the suspending and the new IO may be in-progress at the same time. >> >> One idea I thought of is to use seqlock to sync changing & reading q->rpm_status, >> and looks read lock(read_seqcount_begin/read_seqcount_retry) shouldn't introduce >> big cost in fast path. > > Let's please keep in mind that this is runtime pm stuff. Better to > make the rules relaxed around it, instead of adding synchronization. But the race has to be avoided, otherwise IO may be failed. I don't find any simple solution yet for avoiding the race without adding sync. Any idea for avoiding the race without using sync like seqlock or others? Thanks, Ming Lei