On 2/1/24 08:41, Bart Van Assche wrote: > On 1/31/24 15:04, Keith Busch wrote: >> I didn't have anything in mind; just that protocols don't require all >> commands be fast. > > The default block layer timeout is 30 seconds because typical storage > commands complete in much less than 30 seconds. > >> NVMe has wait event commands that might not ever complete. > > Are you perhaps referring to the NVMe Asynchronous Event Request > command? That command doesn't count because the command ID for that > command comes from another set than I/O commands. From the NVMe > driver: > > static inline bool nvme_is_aen_req(u16 qid, __u16 command_id) > { > return !qid && > nvme_tag_from_cid(command_id) >= NVME_AQ_BLK_MQ_DEPTH; > } > >> A copy command requesting multiple terabyes won't be quick for even the >> fastest hardware (not "hours", but not fast). > > Is there any setup in which such large commands are submitted? Write > commands that last long may negatively affect read latency. This is a > good reason not to make the max_sectors value too large. Even if max_sectors is not very large, if the device has a gigantic write cache that needs to be flushed first to be able to process an incoming write, then writes can be slow. I have seen issues in the field with that causing timeouts. Even a worst case scenario: HDDs doing on-media caching of writes when the volatile write cache is disabled by the user. Then if the on-media write cache needs to be freed up for a new write, the HDD will be very very slow handling writes. There are plenty of scenarios out there where the device can suddenly become slow, hogging a lot of tags in the process. >> If hardware stops responding, the tags are locked up for as long as it >> takes recovery escalation to reclaim them. For nvme, error recovery >> could take over a minute by default. > > If hardware stops responding, who cares about fairness of tag allocation > since this means that request processing halts for all queues associated > with the controller that locked up? Considering the above, it would be more about horrible slowdown of all devices sharing the tagset because for whatever reason one of the device is slow. Note: this is only my 2 cents input. I have not seen any issue in practice with shared tagset, but I do not think I ever encountered a system actually using that feature :) -- Damien Le Moal Western Digital Research