Re: [LSF/MM/BPF TOPIC] NVMe HDD

Keith Busch <kbusch@xxxxxxxxxx> · Fri, 14 Feb 2020 23:40:07 +0900

On Fri, Feb 14, 2020 at 08:32:57AM +0100, Hannes Reinecke wrote:
> On 2/13/20 5:17 AM, Martin K. Petersen wrote:
> > People often artificially lower the queue depth to avoid timeouts. The
> > default timeout is 30 seconds from an I/O is queued. However, many
> > enterprise applications set the timeout to 3-5 seconds. Which means that
> > with deep queues you'll quickly start seeing timeouts if a drive
> > temporarily is having issues keeping up (media errors, excessive spare
> > track seeks, etc.).
> > 
> > Well-behaved devices will return QF/TSF if they have transient resource
> > starvation or exceed internal QoS limits. QF will cause the SCSI stack
> > to reduce the number of I/Os in flight. This allows the drive to recover
> > from its congested state and reduces the potential of application and
> > filesystem timeouts.
> > 
> This may even be a chance to revisit QoS / queue busy handling.
> NVMe has this SQ head pointer mechanism which was supposed to handle
> this kind of situations, but to my knowledge no-one has been
> implementing it.
> Might be worthwhile revisiting it; guess NVMe HDDs would profit from that.

We don't need that because we don't allocate enough tags to potentially
wrap the tail past the head. If you can allocate a tag, the queue is not
full. And convesely, no tag == queue full.