On 2020/07/22 21:43, Johannes Thumshirn wrote: > On 21/07/2020 07:54, Christoph Hellwig wrote: >> On Mon, Jul 20, 2020 at 04:48:50PM +0000, Johannes Thumshirn wrote: >>> On 20/07/2020 15:45, Christoph Hellwig wrote: >>>> On Mon, Jul 20, 2020 at 10:21:18PM +0900, Johannes Thumshirn wrote: >>>>> On a successful completion, the position the data is written to is >>>>> returned via AIO's res2 field to the calling application. >>>> >>>> That is a major, and except for this changelog, undocumented ABI >>>> change. We had the whole discussion about reporting append results >>>> in a few threads and the issues with that in io_uring. So let's >>>> have that discussion there and don't mix it up with how zonefs >>>> writes data. Without that a lot of the boilerplate code should >>>> also go away. >>>> >>> >>> OK maybe I didn't remember correctly, but wasn't this all around >>> io_uring and how we'd report the location back for raw block device >>> access? >> >> Report the write offset. The author seems to be hell bent on making >> it block device specific, but that is a horrible idea as it is just >> as useful for normal file systems (or zonefs). > > After having looked into io_uring I don't this there is anything that > prevents io_uring from picking up the write offset from ki_complete's > res2 argument. As of now io_uring ignores the filed but that can be > changed. > > The reporting of the write offset to user-space still needs to be > decided on from an io_uring PoV. > > So the only thing that needs to be done from a zonefs perspective is > documenting the use of res2 and CC linux-aio and linux-abi (including > an update of the io_getevents man page). > > Or am I completely off track now? That is the general idea. But Christoph point was that reporting the effective write offset back to user space can be done not only for zone append, but also for regular FS/files that are open with O_APPEND and being written with AIOs, legacy or io_uring. Since for this case, the aio->aio_offset field is ignored and the kiocb pos is initialized with the file size, then incremented with size for the next AIO, the user never actually sees the actual write offset of its AIOs. Reporting that back for regular files too can be useful, even though current application can do without this (or do not use O_APPEND because it is lacking). Christoph, please loudly shout at me if I misunderstood you :) For the regular FS/file case, getting the written file offset is simple. Only need to use the kiocb->pos. That is not a per FS change. For the user interface, yes, I agree, res2 is the way to go. And we need to decide for io_uring how to do it. That is an API change, bacward compatible for legacy AIO, but still a change. So linux-aio and linux-api lists should be consulted. Ideally, for io_uring, something backward compatible would be nice too. Not sure how to do it yet. Whatever the interface, plugging zonefs into it is the trivial part as you already did the heavier lifting with writing the async zone append path. > > Thanks, > Johannes > -- Damien Le Moal Western Digital Research