On 6/15/22 07:39, Bart Van Assche wrote: > On 6/14/22 14:47, Khazhy Kumykov wrote: >> On Tue, Jun 14, 2022 at 10:49 AM Bart Van Assche <bvanassche@xxxxxxx> wrote: >>> >>> From ZBC-2: "The device server terminates with CHECK CONDITION status, with >>> the sense key set to ILLEGAL REQUEST, and the additional sense code set to >>> UNALIGNED WRITE COMMAND a write command, other than an entire medium write >>> same command, that specifies: a) the starting LBA in a sequential write >>> required zone set to a value that is not equal to the write pointer for that >>> sequential write required zone; or b) an ending LBA that is not equal to the >>> last logical block within a physical block (see SBC-5)." >>> >>> I am not aware of any other conditions that may trigger the UNALIGNED >>> WRITE COMMAND response. >>> >>> Retry unaligned writes in preparation of removing zone locking. >> Is /just/ retrying effective here? A series of writes to the same zone >> would all need to be sent in order - in the worst case (requests >> somehow ordered in reverse order) this becomes quadratic as only 1 >> request "succeeds" out of the N outstanding requests, with the rest >> all needing to retry. (Imagine a user writes an entire "zone" - which >> could be split into hundreds of requests). >> >> Block layer / schedulers are free to do this reordering, which I >> understand does happen whenever we need to requeue - and would result >> in a retry of all writes after the first re-ordered request. (side >> note: fwiw "requests somehow in reverse order" can happen - bfq >> inherited cfq's odd behavior of sometimes issuing sequential IO in >> reverse order due to back_seek, e.g.) > > Hi Khazhy, > > For zoned block devices I propose to only support those I/O schedulers > that either preserve the LBA order or fix the LBA order if two or more > out-of-order requests are received by the I/O scheduler. We try that "fix" with the work for zoned btrfs. It does not work. Even adding a delay to wait for out of order requests (if there is a hole in a write sequence) does not reliably work as FSes may sometimes take 10s of seconds to issue all write requests that can be all ordered into a nice write stream. Even with that delay increased to minutes, we were still seeing unaligned write errors. > > I agree that in the worst case the number of retries is proportional to > the square of the number of pending requests. However, for the use case > that matters most to me, F2FS on top of a UFS device, we haven't seen > any retries in our tests without I/O scheduler. This is probably because > of how F2FS submits writes combined with the UFS controller only > supporting a single hardware queue. I expect to see a small number of > retries once UFS controllers become available that support multiple > hardware queues. > > Thanks, > > Bart. -- Damien Le Moal Western Digital Research