On Wed, Aug 30, 2017 at 09:51:31AM -0600, Jens Axboe wrote: > On 08/30/2017 09:39 AM, Ming Lei wrote: > > On Wed, Aug 30, 2017 at 09:22:42AM -0600, Jens Axboe wrote: > >> On 08/30/2017 09:19 AM, Ming Lei wrote: > >>> It is more reasonable to add requests to ->dispatch in way > >>> of FIFO style, instead of LIFO style. > >>> > >>> Also in this way, we can allow to insert request at the front > >>> of hw queue, which function is needed to fix one bug > >>> in blk-mq's implementation of blk_execute_rq() > >>> > >>> Reported-by: Oleksandr Natalenko <oleksandr@xxxxxxxxxxxxxx> > >>> Tested-by: Oleksandr Natalenko <oleksandr@xxxxxxxxxxxxxx> > >>> Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx> > >>> --- > >>> block/blk-mq-sched.c | 2 +- > >>> block/blk-mq.c | 2 +- > >>> 2 files changed, 2 insertions(+), 2 deletions(-) > >>> > >>> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c > >>> index 4ab69435708c..8d97df40fc28 100644 > >>> --- a/block/blk-mq-sched.c > >>> +++ b/block/blk-mq-sched.c > >>> @@ -272,7 +272,7 @@ static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx, > >>> * the dispatch list. > >>> */ > >>> spin_lock(&hctx->lock); > >>> - list_add(&rq->queuelist, &hctx->dispatch); > >>> + list_add_tail(&rq->queuelist, &hctx->dispatch); > >>> spin_unlock(&hctx->lock); > >>> return true; > >>> } > >>> diff --git a/block/blk-mq.c b/block/blk-mq.c > >>> index 4603b115e234..fed3d0c16266 100644 > >>> --- a/block/blk-mq.c > >>> +++ b/block/blk-mq.c > >>> @@ -1067,7 +1067,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list) > >>> blk_mq_put_driver_tag(rq); > >>> > >>> spin_lock(&hctx->lock); > >>> - list_splice_init(list, &hctx->dispatch); > >>> + list_splice_tail_init(list, &hctx->dispatch); > >>> spin_unlock(&hctx->lock); > >> > >> I'm not convinced this is safe, there's actually a reason why the > >> request is added to the front and not the back. We do have > >> reorder_tags_to_front() as a safe guard, but I'd much rather get rid of > > > > reorder_tags_to_front() is for reordering the requests in current list, > > this patch is for splicing list into hctx->dispatch, so I can't see > > it isn't safe, or could you explain it a bit? > > If we can get the ordering right, then down the line we won't need to > have the tags reordering at all. It's an ugly hack that I'd love to see > go away. If reorder_tags_to_front() isn't removed, this patch is still safe. But blk_execute_rq_nowait() need to add one request in the front of hw queue, that can be a contradiction compared with maintaining a perfect order for removing reorder_tags_to_front(). So could you share your opinion on the 2nd patch for fixing blk_execute_rq_nowait()? > > >> that than make this change. > >> > >> What's your reasoning here? Your changelog doesn't really explain why > > > > Firstly the 2nd patch need to add one rq(such as RQF_PM) to the > > front of the hw queue, the simple way is to add it to the front > > of hctx->dispatch. Without this change, the 2nd patch can't work > > at all. > > > > Secondly this way is still reasonable: > > > > - one rq is added to hctx->dispatch because queue is busy > > - another rq is added to hctx->dispatch too because of same reason > > > > so it is reasonable to to add list into hctx->dispatch in FIFO style. > > Not disagreeing with the logic. But it also begs the question of why we > don't apply the same treatment to when we splice leftovers to the > dispatch list, currently we front splice that. > > All I'm saying is that you need to tread very carefully with this, and > throw it through some careful testing to ensure that we don't introduce > conditions that now livelock. NVMe is the easy test case, that will Yes, ->dispatch is far away from NVMe, but friends of SCSI-MQ. > generally always work since we never run out of tags. The problematic > test case is usually things like SATA with 31 tags, and especially SATA > with flushes that don't queue. One good test case is the one where you > end up having all tags (or almost all) consumed by flushes, and still > ensuring that we're making forward progress. Understood! Even we can make the test more aggressive. I just setup one virtio-scsi by changing both 'can_queue' and 'cmd_per_lun' as 1, that means the hw queue depth is 1, and hw queue number is set as 1. Then I run 'dbench -t 30 -s -F 64' in ext4 which is over this virtio-scsi device. The dbench(64 jobs, sync write, fsync) just works fine with this patch applied. -- Ming