On 1/10/23 08:51, Bart Van Assche wrote: > On 1/9/23 15:46, Damien Le Moal wrote: >> On 1/10/23 08:27, Bart Van Assche wrote: >>> Measurements have shown that limiting the queue depth to one for zoned >>> writes has a significant negative performance impact on zoned UFS devices. >>> Hence this patch that disables zone locking from the mq-deadline scheduler >>> for storage controllers that support pipelining zoned writes. This patch is >>> based on the following assumptions: >>> - Applications submit write requests to sequential write required zones >>> in order. >>> - It happens infrequently that zoned write requests are reordered by the >>> block layer. >>> - The storage controller does not reorder write requests that have been >>> submitted to the same hardware queue. This is the case for UFS: the >>> UFSHCI specification requires that UFS controllers process requests in >>> order per hardware queue. >>> - The I/O priority of all pipelined write requests is the same per zone. >>> - Either no I/O scheduler is used or an I/O scheduler is used that >>> submits write requests per zone in LBA order. >>> >>> If applications submit write requests to sequential write required zones >>> in order, at least one of the pending requests will succeed. Hence, the >>> number of retries that is required is at most (number of pending >>> requests) - 1. >> >> But if the mid-layer decides to requeue a write request, the workqueue >> used in the mq block layer for requeuing is going to completely destroy >> write ordering as that is outside of the submission path, working in >> parallel with it... Does blk_queue_pipeline_zoned_writes() == true also >> guarantee that a write request will *never* be requeued before hitting the >> adapter/device ? > > We don't need the guarantee that reordering will never happen. What we > need is that reordering happens infrequently (e.g. less than 1% of the > cases). This is what the last paragraph before your reply refers to. > Maybe I should expand that paragraph. But my point is that if a request goes through the block layer requeue, it will be out of order, and will be submitted out of order again, and will fail again. Unless you stall dispatching, wait for all requeues to come back in the scheduler, and then start trying again, I do not see how you can guarantee that retrying the unaligned writes will ever succeed. I am talking in the context of host-managed devices here. -- Damien Le Moal Western Digital Research