On 2020/07/31 16:59, Kanchan Joshi wrote: > On Fri, Jul 31, 2020 at 12:29 PM Damien Le Moal <Damien.LeMoal@xxxxxxx> wrote: >> >> On 2020/07/31 15:45, hch@xxxxxxxxxxxxx wrote: >>> On Fri, Jul 31, 2020 at 06:42:10AM +0000, Damien Le Moal wrote: >>>>> - We may not be able to use RWF_APPEND, and need exposing a new >>>>> type/flag (RWF_INDIRECT_OFFSET etc.) user-space. Not sure if this >>>>> sounds outrageous, but is it OK to have uring-only flag which can be >>>>> combined with RWF_APPEND? >>>> >>>> Why ? Where is the problem ? O_APPEND/RWF_APPEND is currently meaningless for >>>> raw block device accesses. We could certainly define a meaning for these in the >>>> context of zoned block devices. >>> >>> We can't just add a meaning for O_APPEND on block devices now, >>> as it was previously silently ignored. I also really don't think any >>> of these semantics even fit the block device to start with. If you >>> want to work on raw zones use zonefs, that's what is exists for. >> >> Which is fine with me. Just trying to say that I think this is exactly the >> discussion we need to start with. What interface do we implement... >> >> Allowing zone append only through zonefs as the raw block device equivalent, all >> the O_APPEND/RWF_APPEND semantic is defined and the "return written offset" >> implementation in VFS would be common for all file systems, including regular >> ones. Beside that, there is I think the question of short writes... Not sure if >> short writes can currently happen with async RWF_APPEND writes to regular files. >> I think not but that may depend on the FS. > > generic_write_check_limits (called by generic_write_checks, used by > most FS) may make it short, and AFAIK it does not depend on > async/sync. Johannes has a patch (not posted yet) fixing all this for zonefs, differentiating sync and async cases, allow short writes or not, etc. This was done by not using generic_write_check_limits() and instead writing a zonefs_check_write() function that is zone append friendly. We can post that as a base for the discussion on semantic if you want... > This was one of the reason why we chose to isolate the operation by a > different IOCB flag and not by IOCB_APPEND alone. For zonefs, the plan is: * For the sync write case, zone append is always used. * For the async write case, if we see IOCB_APPEND, then zone append BIOs are used. If not, regular write BIOs are used. Simple enough I think. No need for a new flag. -- Damien Le Moal Western Digital Research