On 1/9/23 7:24?PM, Damien Le Moal wrote: > On 1/10/23 09:48, Jens Axboe wrote: >> On 1/9/23 5:44?PM, Bart Van Assche wrote: >>> On 1/9/23 16:41, Jens Axboe wrote: >>>> Or, probably better, a stacked scheduler where the bottom one can be zone >>>> away. Then we can get rid of littering the entire stack and IO schedulers >>>> with silly blk_queue_pipeline_zoned_writes() or blk_is_zoned_write() etc. >>> >>> Hi Jens, >>> >>> Isn't one of Damien's viewpoints that an I/O scheduler should not do >>> the reordering of write requests since reordering of write requests >>> may involve waiting for write requests, write request that will never >>> be received if all tags have been allocated? >> >> It should be work conservering, it should not wait for anything. If >> there are holes or gaps, then there's nothing the scheduler can do. >> >> My point is that the strict ordering was pretty hacky when it went in, >> and rather than get better, it's proliferating. That's not a good >> direction. > > Yes, and hard to maintain/avoid breaking something. Indeed! It's both fragile and ends up adding branches in a bunch of spots in the generic code, which isn't ideal either from an efficiency pov. > Given that only writes need special handling, I am thinking that having a > dedicated write queue for submission/scheduling/requeue could > significantly clean things up. Essentially, we would have a different code > path for zoned device write from submit_bio(). Something like: > > if (queue_is_zoned() && op_is_write()) > return blk_zoned_write_submit(); > > at the top of submit_bio(). That zone write code can be isolated in > block/blk-zoned.c and avoid spreading "if (zoned)" all over the place. > E.g. the flush machinery reorders writes right now... That needs fixing, > more "if (zoned)" coming... > > That special zone write queue could also do its own dispatch scheduling, > so no need to hack existing schedulers. This seems very reasonable, and would just have the one check at queue time, and then one at requeue time (which is fine, that's not a fast path in any case). -- Jens Axboe