Re: [PATCH 5/8] virtio_blk: implement mq_ops->commit_rqs() hook

Jens Axboe <axboe@xxxxxxxxx> · Wed, 28 Nov 2018 20:53:20 -0700

On 11/28/18 8:27 PM, Ming Lei wrote:
> On Wed, Nov 28, 2018 at 08:13:43PM -0700, Jens Axboe wrote:
>> On 11/28/18 7:51 PM, Ming Lei wrote:
>>> On Wed, Nov 28, 2018 at 07:19:09PM -0700, Jens Axboe wrote:
>>>> On 11/28/18 6:23 PM, Ming Lei wrote:
>>>>> On Tue, Nov 27, 2018 at 07:34:51PM -0700, Jens Axboe wrote:
>>>>>> On 11/27/18 7:10 PM, Ming Lei wrote:
>>>>>>> On Mon, Nov 26, 2018 at 09:35:53AM -0700, Jens Axboe wrote:
>>>>>>>> We need this for blk-mq to kick things into gear, if we told it that
>>>>>>>> we had more IO coming, but then failed to deliver on that promise.
>>>>>>>>
>>>>>>>> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
>>>>>>>> ---
>>>>>>>>  drivers/block/virtio_blk.c | 15 +++++++++++++++
>>>>>>>>  1 file changed, 15 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
>>>>>>>> index 6e869d05f91e..b49c57e77780 100644
>>>>>>>> --- a/drivers/block/virtio_blk.c
>>>>>>>> +++ b/drivers/block/virtio_blk.c
>>>>>>>> @@ -214,6 +214,20 @@ static void virtblk_done(struct virtqueue *vq)
>>>>>>>>  	spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
>>>>>>>>  }
>>>>>>>>  
>>>>>>>> +static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
>>>>>>>> +{
>>>>>>>> +	struct virtio_blk *vblk = hctx->queue->queuedata;
>>>>>>>> +	int qid = hctx->queue_num;
>>>>>>>> +	bool kick;
>>>>>>>> +
>>>>>>>> +	spin_lock_irq(&vblk->vqs[qid].lock);
>>>>>>>> +	kick = virtqueue_kick_prepare(vblk->vqs[qid].vq);
>>>>>>>> +	spin_unlock_irq(&vblk->vqs[qid].lock);
>>>>>>>> +
>>>>>>>> +	if (kick)
>>>>>>>> +		virtqueue_notify(vblk->vqs[qid].vq);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>>  static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
>>>>>>>>  			   const struct blk_mq_queue_data *bd)
>>>>>>>>  {
>>>>>>>> @@ -638,6 +652,7 @@ static void virtblk_initialize_rq(struct request *req)
>>>>>>>>  
>>>>>>>>  static const struct blk_mq_ops virtio_mq_ops = {
>>>>>>>>  	.queue_rq	= virtio_queue_rq,
>>>>>>>> +	.commit_rqs	= virtio_commit_rqs,
>>>>>>>>  	.complete	= virtblk_request_done,
>>>>>>>>  	.init_request	= virtblk_init_request,
>>>>>>>>  #ifdef CONFIG_VIRTIO_BLK_SCSI
>>>>>>>> -- 
>>>>>>>> 2.17.1
>>>>>>>>
>>>>>>>
>>>>>>> If .commit_rqs() is implemented, virtqueue_notify() in virtio_queue_rq()
>>>>>>> should have been removed for saving the world switch per .queue_rq()
>>>>>>
>>>>>> ->commits_rqs() is only for the case where bd->last is set to false,
>>>>>> and we never make it to the end and flag bd->last == true. If bd->last
>>>>>> is true, the driver should kick things into gear.
>>>>>
>>>>> OK, looks I misunderstood it. However, virtio-blk doesn't need this
>>>>> change since virtio_queue_rq() can handle it well. This patch may introduce
>>>>> one unnecessary VM world switch in case of queue busy.
>>>>
>>>> Not it won't, it may in the case of some failure outside of the driver.
>>>
>>> If the failure is because of out of tag, blk_mq_dispatch_wake() will
>>> rerun the queue, and the bd->last will be set finally. Or is there
>>> other failure(outside of driver) not covered?
>>
>> The point is to make this happen when we commit the IOs, not needing to
>> do a restart (or relying on IO being in-flight). If we're submitting a
>> string of requests, we should not rely on failures happening only due to
>> IO being going and thus restarting us. It defeats the purpose of even
>> having ->last in the first place.
> 
> OK, it makes sense.
> 
>>
>>>> The only reason that virtio-blk doesn't currently hang is because it
>>>> has restart logic, and the failure case only happens in the if we
>>>> already have IO in-flight.
>>>
>>> Yeah, virtqueue_kick() is called in case of any error in
>>> virtio_queue_rq(), so I am still wondering why we have to implement
>>> .commit_rqs() for virtio-blk.
>>
>> It's not strictly needed for virtio-blk with the restart logic that it
>> has, but I think it'd be nicer to kill that since we have other real use
>> cases of bd->last at this point.
>>
>>>>> IMO bd->last won't work well in case of io scheduler given the rq_list
>>>>> only includes one single request.
>>>>
>>>> But that's a fake limitation that definitely should just be lifted,
>>>> the fact that blk-mq-sched is _currently_ just doing single requests
>>>> is woefully inefficient.
>>>
>>> I agree, but seems a bit hard given we have to consider request
>>> merge.
>>
>> We don't have to drain everything, it should still be feasible to submit
>> at least a batch of requests. For basic sequential IO, you want to leave
>> the last one in the queue, if you have IOs going, for instance. But
>> doing each and every request individually is a huge extra task. Doing
>> IOPS comparisons of kyber and no scheduler reveals that to be very true.
>>
>>>>> I wrote this kind of patch(never posted) before to use sort of
>>>>> ->commits_rqs() to replace the current bd->last mechanism which need
>>>>> one extra driver tag, which may improve the above case, also code gets
>>>>> cleaned up.
>>>>
>>>> It doesn't need one extra driver tag, we currently get an extra one just
>>>> to flag ->last correctly. That's not a requirement, that's a limitation
>>>> of the current implementation. We could get rid of that, and it it
>>>> proves to be an issue, that's not hard to do.
>>>
>>> What do you think about using .commit_rqs() to replace ->last? For
>>> example, just call .commit_rqs() after the last request is queued to
>>> driver successfully. Then we can remove bd->last and avoid to get the
>>> extra tag for figuring out bd->last.
>>
>> I don't want to make ->commit_rqs() part of the regular execution, it
>> should be relegated to the "failure" case of not being able to fulfil
>> our promise of sending a request with bd->last == true. Reasons
>> mentioned earlier, but basically it's more efficient to commit from
>> inside ->queue_rq() if we can, so we don't have to re-grab the
>> submission lock needlessly.
>>
>> I like the idea of separate ->queue and ->commit, but in practice I
>> don't see it working out without a performance penalty.
> 
> Thanks for your detailed explanation, this patch looks fine:
> 
> Reviewed-by: Ming Lei <ming.lei@xxxxxxxxxx>

Thanks Ming.

-- 
Jens Axboe