Re: [PATCH 1/4] blk-mq: introduce BLK_MQ_F_SCHED_USE_HW_TAG

Ming Lei <ming.lei@xxxxxxxxxx> · Wed, 10 May 2017 15:25:22 +0800

Hi Jens,

On Thu, May 04, 2017 at 08:06:15AM -0600, Jens Axboe wrote:
...
> 
> No we do not. 256 is a LOT. I realize most of the devices expose 64K *
> num_hw_queues of depth. Expecting to utilize all that is insane.
> Internally, these devices have nowhere near that amount of parallelism.
> Hence we'd go well beyond the latency knee in the curve if we just allow
> tons of writeback to queue up, for example. Reaching peak performance on
> these devices do not require more than 256 requests, in fact it can be
> done much sooner. For a default setting, I'd actually argue that 256 is
> too much, and that we should set it lower.

After studying SSD and NVMe a bit, I think your point of '256 is a LOT'
is correct:

1) inside SSD, the channel number won't be big, which is often about 10
in high-end SSD, so 256 should be big enough to maximize the utilization
of each channel, even though multi-bank, multi-die or multi-plane are
considered.

2) For NVMe, the IO queue depth(size) is at most 64K according to spec,
now the driver limits it at most 1024, but the queue itself is totally
allocated from system memory, then big queue size won't increase NVMe
chip cost.

So I think we can respect .nr_requests via resizing hw tags in
blk_mq_init_sched(), as suggested by Omar. If you don't object that,
I will send out V3 soon.

Thanks,
Ming