On 7/19/23 07:53, Bart Van Assche wrote: > On 7/17/23 23:47, Damien Le Moal wrote: >> On 7/11/23 03:01, Bart Van Assche wrote: >>> Send commands that failed with an unaligned write error to the SCSI >>> error >>> handler. Let the SCSI error handler sort SCSI commands per LBA before >>> resubmitting these. >>> >>> Increase the number of retries for write commands sent to a sequential >>> zone to the maximum number of outstanding commands. >> >> I think I mentioned this before. When we started btrfs work, we did >> something >> similar (but at the IO scheduler level) to try to avoid adding a big >> lock in >> btrfs to serialize (and thus order) writes. What we discovered is that >> it was >> extremely easy to fall into a situation were the maximum number of >> possible >> outstanding request is already issued, but they all are behind a >> "hole" and >> indefinitely delayed because the missing request cannot be issued due >> to the max >> nr request limit being reached. No forward progress and deadlock. >> >> I do not see how your change addresses this problem. The same will >> happen with >> this and I do not have any suggestion how to solve this. For btrfs, we >> ended up >> using cone append emulation for scsi to avoid the big lock and avoid >> the FS from >> having to order writes. That solution guarantees forward progress. >> Delaying >> already issued writes that are not sequential has no such guarantees. > > Hi Damien, > > Thank you for having explained in detail the scenario that you ran into. > > I think what has been explained above is a scenario in which the filesystem > allocates requests per zone in another order than the LBA order. How about > requiring that the filesystem allocates and submits zoned writes in LBA > order > per zone? I think that this is how F2FS supports zoned storage. Sure. But what if an application uses the drive directly ? You loose guarantees of forward progress then. Given that an application has to use direct IO for writes to sequential zones, this is unlikely to happen in a "good" scenario, but it also would not be hard to write an application that can deadlock the drive forever by simply missing one write in a sequence of writes for a zone... That is my concern. While f2fs would likely be OK, the delay approach is not solid enough for all cases. -- Damien Le Moal Western Digital Research