On 1/10/23 09:48, Jens Axboe wrote: > On 1/9/23 5:44?PM, Bart Van Assche wrote: >> On 1/9/23 16:41, Jens Axboe wrote: >>> Or, probably better, a stacked scheduler where the bottom one can be zone >>> away. Then we can get rid of littering the entire stack and IO schedulers >>> with silly blk_queue_pipeline_zoned_writes() or blk_is_zoned_write() etc. >> >> Hi Jens, >> >> Isn't one of Damien's viewpoints that an I/O scheduler should not do >> the reordering of write requests since reordering of write requests >> may involve waiting for write requests, write request that will never >> be received if all tags have been allocated? > > It should be work conservering, it should not wait for anything. If > there are holes or gaps, then there's nothing the scheduler can do. > > My point is that the strict ordering was pretty hacky when it went in, > and rather than get better, it's proliferating. That's not a good > direction. Yes, and hard to maintain/avoid breaking something. Given that only writes need special handling, I am thinking that having a dedicated write queue for submission/scheduling/requeue could significantly clean things up. Essentially, we would have a different code path for zoned device write from submit_bio(). Something like: if (queue_is_zoned() && op_is_write()) return blk_zoned_write_submit(); at the top of submit_bio(). That zone write code can be isolated in block/blk-zoned.c and avoid spreading "if (zoned)" all over the place. E.g. the flush machinery reorders writes right now... That needs fixing, more "if (zoned)" coming... That special zone write queue could also do its own dispatch scheduling, so no need to hack existing schedulers. Need to try coding something to see how it goes. -- Damien Le Moal Western Digital Research