Hi Jens, On Thu, May 04, 2017 at 08:06:15AM -0600, Jens Axboe wrote: ... > > No we do not. 256 is a LOT. I realize most of the devices expose 64K * > num_hw_queues of depth. Expecting to utilize all that is insane. > Internally, these devices have nowhere near that amount of parallelism. > Hence we'd go well beyond the latency knee in the curve if we just allow > tons of writeback to queue up, for example. Reaching peak performance on > these devices do not require more than 256 requests, in fact it can be > done much sooner. For a default setting, I'd actually argue that 256 is > too much, and that we should set it lower. After studying SSD and NVMe a bit, I think your point of '256 is a LOT' is correct: 1) inside SSD, the channel number won't be big, which is often about 10 in high-end SSD, so 256 should be big enough to maximize the utilization of each channel, even though multi-bank, multi-die or multi-plane are considered. 2) For NVMe, the IO queue depth(size) is at most 64K according to spec, now the driver limits it at most 1024, but the queue itself is totally allocated from system memory, then big queue size won't increase NVMe chip cost. So I think we can respect .nr_requests via resizing hw tags in blk_mq_init_sched(), as suggested by Omar. If you don't object that, I will send out V3 soon. Thanks, Ming