On Wed, Jul 22, 2020 at 8:22 PM Christoph Hellwig <hch@xxxxxx> wrote: > > On Wed, Jul 22, 2020 at 12:43:21PM +0000, Johannes Thumshirn wrote: > > On 21/07/2020 07:54, Christoph Hellwig wrote: > > > On Mon, Jul 20, 2020 at 04:48:50PM +0000, Johannes Thumshirn wrote: > > >> On 20/07/2020 15:45, Christoph Hellwig wrote: > > >>> On Mon, Jul 20, 2020 at 10:21:18PM +0900, Johannes Thumshirn wrote: > > >>>> On a successful completion, the position the data is written to is > > >>>> returned via AIO's res2 field to the calling application. > > >>> > > >>> That is a major, and except for this changelog, undocumented ABI > > >>> change. We had the whole discussion about reporting append results > > >>> in a few threads and the issues with that in io_uring. So let's > > >>> have that discussion there and don't mix it up with how zonefs > > >>> writes data. Without that a lot of the boilerplate code should > > >>> also go away. > > >>> > > >> > > >> OK maybe I didn't remember correctly, but wasn't this all around > > >> io_uring and how we'd report the location back for raw block device > > >> access? > > > > > > Report the write offset. The author seems to be hell bent on making > > > it block device specific, but that is a horrible idea as it is just > > > as useful for normal file systems (or zonefs). Patchset only made the feature opt-in, due to the constraints that we had. ZoneFS was always considered and it fits as fine as block-IO. You already know that we did not have enough room in io-uring, which did not really allow to think of other FS (any-size cached-writes). After working on multiple schemes in io_uring, now we have 64bits, and we will return absolute offset in bytes now (in V4). But still, it comes at the cost of sacrificing the ability to do short-write, which is fine for zone-append but may trigger behavior-change for regular file-append. Write may become short if - spanning beyond end-of-file - going beyond RLIMIT_FSIZE limit - probably for MAX_NON_LFS as well We need to fail all above cases if we extend the current model for regular FS. And that may break existing file-append users. Class of applications which just append without caring about the exact location - attempt was not to affect these while we try to enable the path for zone-append. Patches use O/RWF_APPEND, but try to isolate appending-write (IOCB_APPEND) from appending-write-that-returns-location (IOCB_ZONE_APPEND - can be renamed when we actually have all that it takes to apply the feature in regular FS). Enabling block-IO and zoneFS now, and keeping regular-FS as future work - hope that does not sound too bad! > > After having looked into io_uring I don't this there is anything that > > prevents io_uring from picking up the write offset from ki_complete's > > res2 argument. As of now io_uring ignores the filed but that can be > > changed. We use ret2 of ki_complete to collect append-offset in io_uring too. It's just that unlike aio it required some work to send it to user-space. -- Kanchan Joshi