Re: [LSF/MM/BPF TOPIC] NVMe HDD

Ming Lei <ming.lei@xxxxxxxxxx> · Fri, 14 Feb 2020 08:40:56 +0800

On Fri, Feb 14, 2020 at 01:30:38AM +0900, Keith Busch wrote:
> On Thu, Feb 13, 2020 at 04:34:13PM +0800, Ming Lei wrote:
> > On Thu, Feb 13, 2020 at 08:24:36AM +0000, Damien Le Moal wrote:
> > > Got it. And since queue full will mean no more tags, submission will block
> > > on get_request() and there will be no chance in the elevator to merge
> > > anything (aside from opportunistic merging in plugs), isn't it ?
> > > So I guess NVMe HDDs will need some tuning in this area.
> > 
> > scheduler queue depth is usually 2 times of hw queue depth, so requests
> > ar usually enough for merging.
> > 
> > For NVMe, there isn't ns queue depth, such as scsi's device queue depth,
> > meantime the hw queue depth is big enough, so no chance to trigger merge.
> 
> Most NVMe devices contain a single namespace anyway, so the shared tag
> queue depth is effectively the ns queue depth, and an NVMe HDD should
> advertise queue count and depth capabilities orders of magnitude lower
> than what we're used to with nvme SSDs. That should get merging and
> BLK_STS_DEV_RESOURCE handling to occur as desired, right?

Right.

The advertised queue depth might serve two purposes:

1) reflect the namespace's actual queueing capability, so block layer's merging
is possible

2) avoid timeout caused by too many in-flight IO

Thanks,
Ming