On 7/27/20 1:16 PM, Kanchan Joshi wrote: > On Fri, Jul 24, 2020 at 10:00 PM Jens Axboe <axboe@xxxxxxxxx> wrote: >> >> On 7/24/20 9:49 AM, Kanchan Joshi wrote: >>> diff --git a/fs/io_uring.c b/fs/io_uring.c >>> index 7809ab2..6510cf5 100644 >>> --- a/fs/io_uring.c >>> +++ b/fs/io_uring.c >>> @@ -1284,8 +1301,15 @@ static void __io_cqring_fill_event(struct io_kiocb *req, long res, long cflags) >>> cqe = io_get_cqring(ctx); >>> if (likely(cqe)) { >>> WRITE_ONCE(cqe->user_data, req->user_data); >>> - WRITE_ONCE(cqe->res, res); >>> - WRITE_ONCE(cqe->flags, cflags); >>> + if (unlikely(req->flags & REQ_F_ZONE_APPEND)) { >>> + if (likely(res > 0)) >>> + WRITE_ONCE(cqe->res64, req->rw.append_offset); >>> + else >>> + WRITE_ONCE(cqe->res64, res); >>> + } else { >>> + WRITE_ONCE(cqe->res, res); >>> + WRITE_ONCE(cqe->flags, cflags); >>> + } >> >> This would be nice to keep out of the fast path, if possible. > > I was thinking of keeping a function-pointer (in io_kiocb) during > submission. That would have avoided this check......but argument count > differs, so it did not add up. But that'd grow the io_kiocb just for this use case, which is arguably even worse. Unless you can keep it in the per-request private data, but there's no more room there for the regular read/write side. >>> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h >>> index 92c2269..2580d93 100644 >>> --- a/include/uapi/linux/io_uring.h >>> +++ b/include/uapi/linux/io_uring.h >>> @@ -156,8 +156,13 @@ enum { >>> */ >>> struct io_uring_cqe { >>> __u64 user_data; /* sqe->data submission passed back */ >>> - __s32 res; /* result code for this event */ >>> - __u32 flags; >>> + union { >>> + struct { >>> + __s32 res; /* result code for this event */ >>> + __u32 flags; >>> + }; >>> + __s64 res64; /* appending offset for zone append */ >>> + }; >>> }; >> >> Is this a compatible change, both for now but also going forward? You >> could randomly have IORING_CQE_F_BUFFER set, or any other future flags. > > Sorry, I didn't quite understand the concern. CQE_F_BUFFER is not > used/set for write currently, so it looked compatible at this point. Not worried about that, since we won't ever use that for writes. But it is a potential headache down the line for other flags, if they apply to normal writes. > Yes, no room for future flags for this operation. > Do you see any other way to enable this support in io-uring? Honestly I think the only viable option is as we discussed previously, pass in a pointer to a 64-bit type where we can copy the additional completion information to. >> Layout would also be different between big and little endian, so not >> even that easy to set aside a flag for this. But even if that was done, >> we'd still have this weird API where liburing or the app would need to >> distinguish this cqe from all others based on... the user_data? Hence >> liburing can't do it, only the app would be able to. >> >> Just seems like a hack to me. > > Yes, only user_data to distinguish. Do liburing helpers need to look > at cqe->res (and decide something) before returning the cqe to > application? They generally don't, outside of the internal timeout. But it's an issue for the API, as it forces applications to handle the CQEs a certain way. Normally there's flexibility. This makes the append writes behave differently than everything else, which is never a good idea. -- Jens Axboe