Re: [PATCH RFC 3/4] blk-mq: prepare for supporting runtime PM

Jens Axboe <axboe@xxxxxxxxx> · Thu, 12 Jul 2018 15:44:05 -0600

On 7/12/18 3:32 PM, Ming Lei wrote:
> On Thu, Jul 12, 2018 at 10:00 PM, Jens Axboe <axboe@xxxxxxxxx> wrote:
>> On 7/12/18 6:28 AM, Ming Lei wrote:
>>> On Thu, Jul 12, 2018 at 05:58:28PM +0800, Ming Lei wrote:
>>>> On Thu, Jul 12, 2018 at 12:29:05AM +0800, Ming Lei wrote:
>>>>> This patch introduces blk_mq_pm_add_request() which is called after
>>>>> allocating one request. Also blk_mq_pm_put_request() is introduced
>>>>> and called after one request is freed.
>>>>>
>>>>> For blk-mq, it can be quite expensive to accounting in-flight IOs,
>>>>> so this patch calls pm_runtime_mark_last_busy() simply after each IO
>>>>> is done, instead of doing that only after the last in-flight IO is done.
>>>>> This way is still workable, since the active non-PM IO will be checked
>>>>> in blk_pre_runtime_suspend(), and runtime suspend will be prevented
>>>>> if there is any active non-PM IO.
>>>>>
>>>>> Also makes blk_post_runtime_resume() to cover blk-mq.
>>>>>
>>>>> Cc: "Rafael J. Wysocki" <rjw@xxxxxxxxxxxxx>
>>>>> Cc: Alan Stern <stern@xxxxxxxxxxxxxxxxxxx>
>>>>> Cc: linux-pm@xxxxxxxxxxxxxxx
>>>>> Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
>>>>> Cc: Christoph Hellwig <hch@xxxxxx>
>>>>> Cc: Bart Van Assche <bart.vanassche@xxxxxxx>
>>>>> Cc: "James E.J. Bottomley" <jejb@xxxxxxxxxxxxxxxxxx>
>>>>> Cc: "Martin K. Petersen" <martin.petersen@xxxxxxxxxx>
>>>>> Cc: linux-scsi@xxxxxxxxxxxxxxx
>>>>> Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx>
>>>>> ---
>>>>>  block/blk-core.c | 12 ++++++++++--
>>>>>  block/blk-mq.c   | 24 ++++++++++++++++++++++++
>>>>>  2 files changed, 34 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/block/blk-core.c b/block/blk-core.c
>>>>> index c4b57d8806fe..bf66d561980d 100644
>>>>> --- a/block/blk-core.c
>>>>> +++ b/block/blk-core.c
>>>>> @@ -3804,12 +3804,17 @@ EXPORT_SYMBOL(blk_pm_runtime_init);
>>>>>  int blk_pre_runtime_suspend(struct request_queue *q)
>>>>>  {
>>>>>     int ret = 0;
>>>>> +   bool active;
>>>>>
>>>>>     if (!q->dev)
>>>>>             return ret;
>>>>>
>>>>>     spin_lock_irq(q->queue_lock);
>>>>> -   if (q->nr_pending) {
>>>>> +   if (!q->mq_ops)
>>>>> +           active = !!q->nr_pending;
>>>>> +   else
>>>>> +           active = !blk_mq_pm_queue_idle(q);
>>>>> +   if (active) {
>>>>>             ret = -EBUSY;
>>>>>             pm_runtime_mark_last_busy(q->dev);
>>>>>     } else {
>>>>
>>>> Looks there is one big issue, one new IO may come just after reading
>>>> 'active' and before writing RPM_SUSPENDING to q->rpm_status, and both
>>>> the suspending and the new IO may be in-progress at the same time.
>>>
>>> One idea I thought of is to use seqlock to sync changing & reading q->rpm_status,
>>> and looks read lock(read_seqcount_begin/read_seqcount_retry) shouldn't introduce
>>> big cost in fast path.
>>
>> Let's please keep in mind that this is runtime pm stuff. Better to
>> make the rules relaxed around it, instead of adding synchronization.
> 
> But the race has to be avoided, otherwise IO may be failed. I don't
> find any simple solution yet for avoiding the race without adding sync.
> 
> Any idea for avoiding the race without using sync like seqlock or others?

I just don't want anything like this in the hot path. Why can't we
handle this similarly to how we handle request timeouts? It'll
potentially delay the suspend by a few seconds, but surely that can't be
a big deal. I don't see why we need to track this on a per-request
basis.

-- 
Jens Axboe