On 11/28/18 8:27 PM, Ming Lei wrote: > On Wed, Nov 28, 2018 at 08:13:43PM -0700, Jens Axboe wrote: >> On 11/28/18 7:51 PM, Ming Lei wrote: >>> On Wed, Nov 28, 2018 at 07:19:09PM -0700, Jens Axboe wrote: >>>> On 11/28/18 6:23 PM, Ming Lei wrote: >>>>> On Tue, Nov 27, 2018 at 07:34:51PM -0700, Jens Axboe wrote: >>>>>> On 11/27/18 7:10 PM, Ming Lei wrote: >>>>>>> On Mon, Nov 26, 2018 at 09:35:53AM -0700, Jens Axboe wrote: >>>>>>>> We need this for blk-mq to kick things into gear, if we told it that >>>>>>>> we had more IO coming, but then failed to deliver on that promise. >>>>>>>> >>>>>>>> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> >>>>>>>> --- >>>>>>>> drivers/block/virtio_blk.c | 15 +++++++++++++++ >>>>>>>> 1 file changed, 15 insertions(+) >>>>>>>> >>>>>>>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c >>>>>>>> index 6e869d05f91e..b49c57e77780 100644 >>>>>>>> --- a/drivers/block/virtio_blk.c >>>>>>>> +++ b/drivers/block/virtio_blk.c >>>>>>>> @@ -214,6 +214,20 @@ static void virtblk_done(struct virtqueue *vq) >>>>>>>> spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags); >>>>>>>> } >>>>>>>> >>>>>>>> +static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx) >>>>>>>> +{ >>>>>>>> + struct virtio_blk *vblk = hctx->queue->queuedata; >>>>>>>> + int qid = hctx->queue_num; >>>>>>>> + bool kick; >>>>>>>> + >>>>>>>> + spin_lock_irq(&vblk->vqs[qid].lock); >>>>>>>> + kick = virtqueue_kick_prepare(vblk->vqs[qid].vq); >>>>>>>> + spin_unlock_irq(&vblk->vqs[qid].lock); >>>>>>>> + >>>>>>>> + if (kick) >>>>>>>> + virtqueue_notify(vblk->vqs[qid].vq); >>>>>>>> +} >>>>>>>> + >>>>>>>> static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx, >>>>>>>> const struct blk_mq_queue_data *bd) >>>>>>>> { >>>>>>>> @@ -638,6 +652,7 @@ static void virtblk_initialize_rq(struct request *req) >>>>>>>> >>>>>>>> static const struct blk_mq_ops virtio_mq_ops = { >>>>>>>> .queue_rq = virtio_queue_rq, >>>>>>>> + .commit_rqs = virtio_commit_rqs, >>>>>>>> .complete = virtblk_request_done, >>>>>>>> .init_request = virtblk_init_request, >>>>>>>> #ifdef CONFIG_VIRTIO_BLK_SCSI >>>>>>>> -- >>>>>>>> 2.17.1 >>>>>>>> >>>>>>> >>>>>>> If .commit_rqs() is implemented, virtqueue_notify() in virtio_queue_rq() >>>>>>> should have been removed for saving the world switch per .queue_rq() >>>>>> >>>>>> ->commits_rqs() is only for the case where bd->last is set to false, >>>>>> and we never make it to the end and flag bd->last == true. If bd->last >>>>>> is true, the driver should kick things into gear. >>>>> >>>>> OK, looks I misunderstood it. However, virtio-blk doesn't need this >>>>> change since virtio_queue_rq() can handle it well. This patch may introduce >>>>> one unnecessary VM world switch in case of queue busy. >>>> >>>> Not it won't, it may in the case of some failure outside of the driver. >>> >>> If the failure is because of out of tag, blk_mq_dispatch_wake() will >>> rerun the queue, and the bd->last will be set finally. Or is there >>> other failure(outside of driver) not covered? >> >> The point is to make this happen when we commit the IOs, not needing to >> do a restart (or relying on IO being in-flight). If we're submitting a >> string of requests, we should not rely on failures happening only due to >> IO being going and thus restarting us. It defeats the purpose of even >> having ->last in the first place. > > OK, it makes sense. > >> >>>> The only reason that virtio-blk doesn't currently hang is because it >>>> has restart logic, and the failure case only happens in the if we >>>> already have IO in-flight. >>> >>> Yeah, virtqueue_kick() is called in case of any error in >>> virtio_queue_rq(), so I am still wondering why we have to implement >>> .commit_rqs() for virtio-blk. >> >> It's not strictly needed for virtio-blk with the restart logic that it >> has, but I think it'd be nicer to kill that since we have other real use >> cases of bd->last at this point. >> >>>>> IMO bd->last won't work well in case of io scheduler given the rq_list >>>>> only includes one single request. >>>> >>>> But that's a fake limitation that definitely should just be lifted, >>>> the fact that blk-mq-sched is _currently_ just doing single requests >>>> is woefully inefficient. >>> >>> I agree, but seems a bit hard given we have to consider request >>> merge. >> >> We don't have to drain everything, it should still be feasible to submit >> at least a batch of requests. For basic sequential IO, you want to leave >> the last one in the queue, if you have IOs going, for instance. But >> doing each and every request individually is a huge extra task. Doing >> IOPS comparisons of kyber and no scheduler reveals that to be very true. >> >>>>> I wrote this kind of patch(never posted) before to use sort of >>>>> ->commits_rqs() to replace the current bd->last mechanism which need >>>>> one extra driver tag, which may improve the above case, also code gets >>>>> cleaned up. >>>> >>>> It doesn't need one extra driver tag, we currently get an extra one just >>>> to flag ->last correctly. That's not a requirement, that's a limitation >>>> of the current implementation. We could get rid of that, and it it >>>> proves to be an issue, that's not hard to do. >>> >>> What do you think about using .commit_rqs() to replace ->last? For >>> example, just call .commit_rqs() after the last request is queued to >>> driver successfully. Then we can remove bd->last and avoid to get the >>> extra tag for figuring out bd->last. >> >> I don't want to make ->commit_rqs() part of the regular execution, it >> should be relegated to the "failure" case of not being able to fulfil >> our promise of sending a request with bd->last == true. Reasons >> mentioned earlier, but basically it's more efficient to commit from >> inside ->queue_rq() if we can, so we don't have to re-grab the >> submission lock needlessly. >> >> I like the idea of separate ->queue and ->commit, but in practice I >> don't see it working out without a performance penalty. > > Thanks for your detailed explanation, this patch looks fine: > > Reviewed-by: Ming Lei <ming.lei@xxxxxxxxxx> Thanks Ming. -- Jens Axboe