Re: [PATCH v6 1/4] block: Make fair tag sharing configurable

Bart Van Assche <bvanassche@xxxxxxx> · Wed, 31 Jan 2024 15:41:03 -0800

On 1/31/24 15:04, Keith Busch wrote:
I didn't have anything in mind; just that protocols don't require all
commands be fast.

The default block layer timeout is 30 seconds because typical storage 
commands complete in much less than 30 seconds.

NVMe has wait event commands that might not ever complete.

Are you perhaps referring to the NVMe Asynchronous Event Request
command? That command doesn't count because the command ID for that
command comes from another set than I/O commands. From the NVMe
driver:

static inline bool nvme_is_aen_req(u16 qid, __u16 command_id)
{
	return !qid &&
		nvme_tag_from_cid(command_id) >= NVME_AQ_BLK_MQ_DEPTH;
}

A copy command requesting multiple terabyes won't be quick for even the
fastest hardware (not "hours", but not fast).

Is there any setup in which such large commands are submitted? Write
commands that last long may negatively affect read latency. This is a
good reason not to make the max_sectors value too large.

If hardware stops responding, the tags are locked up for as long as it
takes recovery escalation to reclaim them. For nvme, error recovery
could take over a minute by default.

If hardware stops responding, who cares about fairness of tag allocation 
since this means that request processing halts for all queues associated
with the controller that locked up?

Bart.