Re: [PATCH 2/2] zonefs: use zone-append for AIO as well

Kanchan Joshi <joshiiitr@xxxxxxxxx> · Fri, 24 Jul 2020 19:27:32 +0530

On Wed, Jul 22, 2020 at 8:22 PM Christoph Hellwig <hch@xxxxxx> wrote:
>
> On Wed, Jul 22, 2020 at 12:43:21PM +0000, Johannes Thumshirn wrote:
> > On 21/07/2020 07:54, Christoph Hellwig wrote:
> > > On Mon, Jul 20, 2020 at 04:48:50PM +0000, Johannes Thumshirn wrote:
> > >> On 20/07/2020 15:45, Christoph Hellwig wrote:
> > >>> On Mon, Jul 20, 2020 at 10:21:18PM +0900, Johannes Thumshirn wrote:
> > >>>> On a successful completion, the position the data is written to is
> > >>>> returned via AIO's res2 field to the calling application.
> > >>>
> > >>> That is a major, and except for this changelog, undocumented ABI
> > >>> change.  We had the whole discussion about reporting append results
> > >>> in a few threads and the issues with that in io_uring.  So let's
> > >>> have that discussion there and don't mix it up with how zonefs
> > >>> writes data.  Without that a lot of the boilerplate code should
> > >>> also go away.
> > >>>
> > >>
> > >> OK maybe I didn't remember correctly, but wasn't this all around
> > >> io_uring and how we'd report the location back for raw block device
> > >> access?
> > >
> > > Report the write offset.  The author seems to be hell bent on making
> > > it block device specific, but that is a horrible idea as it is just
> > > as useful for normal file systems (or zonefs).

Patchset only made the feature opt-in, due to the constraints that we
had. ZoneFS was always considered and it fits as fine as block-IO.
You already know that  we did not have enough room in io-uring, which
did not really allow to think of other FS (any-size cached-writes).
After working on multiple schemes in io_uring, now we have 64bits, and
we will return absolute offset in bytes now (in V4).

But still, it comes at the cost of sacrificing the ability to do
short-write, which is fine for zone-append but may trigger
behavior-change for regular file-append.
Write may become short if
- spanning beyond end-of-file
- going beyond RLIMIT_FSIZE limit
- probably for MAX_NON_LFS as well

We need to fail all above cases if we extend the current model for
regular FS. And that may break existing file-append users.
Class of applications which just append without caring about the exact
location - attempt was not to affect these while we try to enable the
path for zone-append.

Patches use O/RWF_APPEND, but try to isolate appending-write
(IOCB_APPEND) from appending-write-that-returns-location
(IOCB_ZONE_APPEND - can be renamed when we actually have all that it
takes to apply the feature in regular FS).
Enabling block-IO and zoneFS now, and keeping regular-FS as future
work - hope that does not sound too bad!

> > After having looked into io_uring I don't this there is anything that
> > prevents io_uring from picking up the write offset from ki_complete's
> > res2 argument. As of now io_uring ignores the filed but that can be
> > changed.

We use ret2 of ki_complete to collect append-offset in io_uring too.
It's just that unlike aio it required some work to send it to user-space.

--
Kanchan Joshi