On 1/9/23 5:56?PM, Bart Van Assche wrote: > On 1/9/23 16:48, Jens Axboe wrote: >> On 1/9/23 5:44?PM, Bart Van Assche wrote: >>> On 1/9/23 16:41, Jens Axboe wrote: >>>> Or, probably better, a stacked scheduler where the bottom one can be zone >>>> away. Then we can get rid of littering the entire stack and IO schedulers >>>> with silly blk_queue_pipeline_zoned_writes() or blk_is_zoned_write() etc. >>> >>> Hi Jens, >>> >>> Isn't one of Damien's viewpoints that an I/O scheduler should not do >>> the reordering of write requests since reordering of write requests >>> may involve waiting for write requests, write request that will never >>> be received if all tags have been allocated? >> >> It should be work conservering, it should not wait for anything. If >> there are holes or gaps, then there's nothing the scheduler can do. >> >> My point is that the strict ordering was pretty hacky when it went in, >> and rather than get better, it's proliferating. That's not a good >> direction. > > Hi Jens, > > As you know one of the deeply embedded design choices in the blk-mq > code is that reordering can happen at any time between submission of a > request to the blk-mq code and request completion. I agree with that > design choice. Indeed. And getting rid of any ordering ops like barriers greatly simplified things and fixed a number of issued related to that. > For the use cases I'm looking at the sequential write required zone > type works best. This zone type works best since it guarantees that > data on the storage medium is sequential. This results in optimal > sequential read performance. That's a given. > Combining these two approaches is not ideal and I agree that the > combination of these two approaches adds some complexity. Personally I > prefer to add a limited amount of complexity rather than implementing > a new block layer from scratch. I'm not talking about a new block layer at all, ordered devices are not nearly important enough to warrant that kind of attention. Nor would it be a good solution even if they were. I'm merely saying that I'm getting more and more disgruntled with the direction that is being taken to cater to these kinds of devices, and perhaps a much better idea is to contain that complexity in a separate scheduler (be it stacked or not). Because I'm really not thrilled to see the addition of various "is this device ordered" all over the place, and now we are getting "is this device ordered AND pipelined". Do you see what I mean? It's making things _worse_, not better, and we really should be making it better rather than pile more stuff on top of it. -- Jens Axboe