Ming, > However, it depends on if the target device returns the congestion to > host. From my observation, looks there isn't such feedback from NVMe > target. It happens all the time with SCSI devices. It is imperative that this keeps working. > Even if there was such SSD target which provides such congestion > feedback, bypassing .device_busy won't cause big effect too since > blk-mq's SCHED_RESTART will retry this IO returning STS_RESOURCE only > after another in-flight one is completed. The reason we back off is that it allows the device to recover by temporarily reducing its workload. In addition, the lower queue depth alleviates the risk of commands timing out leading to application I/O failures. > At least, Broadcom guys tests this patch on megaraid raid and the > results shows that big improvement was got, that is why the flag is > only set on megaraid host. I do not question that it improves performance. That's not my point. > In theory, .track_queue_depth may only improve sequential IO's > performance for HDD., not very effective for SSD. Or just save a bit > CPU cycles in case of SSD. This is not about performance. This is about how the system behaves when a device is starved for resources or experiencing transient failures. -- Martin K. Petersen Oracle Linux Engineering