On 12/13/23 07:44, Bart Van Assche wrote: > On 12/10/23 23:40, Damien Le Moal wrote: >> On 12/9/23 03:40, Bart Van Assche wrote: >>> My understanding is that blkcg_set_ioprio() is called from inside submit_bio() >>> and hence that the reported issue cannot be solved by modifying F2FS. How about >>> modifying the blk-ioprio policy such that it ignores zoned writes? >> >> I do not see a better solution than that at the moment. So yes, let's do that. >> But please add a big comment in the code explaining why we ignore zoned writes. > > Hi Damien, > > We tested a patch for the blk-ioprio cgroup policy that makes it skip zoned writes. > We noticed that such a patch is not sufficient to prevent unaligned write errors > because some tasks have been assigned an I/O priority via the ionice command > (ioprio_set() system call). I think it would be wrong to skip the assignment of an > I/O priority for zoned writes in all code that can set an I/O priority. Since the > root cause of this issue is the inability of the mq-deadline I/O scheduler to > preserve the order for zoned writes with different I/O priorities, I think this > issue should be fixed in the mq-deadline I/O scheduler. Not necessarily. When the priority for an IO is set when a BIO is prepared, we know where that priority come from: 1) The user kiocb through aio_reqprio 2) The process ionice context 3) priority cgroups We can disable (2) and (3) and leave (1) as is. Trying to solve this issue in mq-deadline would require keeping track of the io priority used for a write request that is issued to a zone and use that same priority for all following write requests for the same zone until there are no writes pending for that zone. Otherwise, you will get the priority inversion causing the reordering. But I think that doing all this without also causing priority inversion for the user, i.e. a high priority write request ends up waiting for a low priority one, will be challenging, to say the least. -- Damien Le Moal Western Digital Research