On 1/31/24 15:04, Keith Busch wrote:
I didn't have anything in mind; just that protocols don't require all commands be fast.
The default block layer timeout is 30 seconds because typical storage commands complete in much less than 30 seconds.
NVMe has wait event commands that might not ever complete.
Are you perhaps referring to the NVMe Asynchronous Event Request command? That command doesn't count because the command ID for that command comes from another set than I/O commands. From the NVMe driver: static inline bool nvme_is_aen_req(u16 qid, __u16 command_id) { return !qid && nvme_tag_from_cid(command_id) >= NVME_AQ_BLK_MQ_DEPTH; }
A copy command requesting multiple terabyes won't be quick for even the fastest hardware (not "hours", but not fast).
Is there any setup in which such large commands are submitted? Write commands that last long may negatively affect read latency. This is a good reason not to make the max_sectors value too large.
If hardware stops responding, the tags are locked up for as long as it takes recovery escalation to reclaim them. For nvme, error recovery could take over a minute by default.
If hardware stops responding, who cares about fairness of tag allocation since this means that request processing halts for all queues associated
with the controller that locked up? Bart.