On 2020/06/18 17:35, javier.gonz@xxxxxxxxxxx wrote: > On 18.06.2020 07:39, Damien Le Moal wrote: >> On 2020/06/18 2:27, Kanchan Joshi wrote: >>> From: Selvakumar S <selvakuma.s1@xxxxxxxxxxx> >>> >>> Introduce three new opcodes for zone-append - >>> >>> IORING_OP_ZONE_APPEND : non-vectord, similiar to IORING_OP_WRITE >>> IORING_OP_ZONE_APPENDV : vectored, similar to IORING_OP_WRITEV >>> IORING_OP_ZONE_APPEND_FIXED : append using fixed-buffers >>> >>> Repurpose cqe->flags to return zone-relative offset. >>> >>> Signed-off-by: SelvaKumar S <selvakuma.s1@xxxxxxxxxxx> >>> Signed-off-by: Kanchan Joshi <joshi.k@xxxxxxxxxxx> >>> Signed-off-by: Nitesh Shetty <nj.shetty@xxxxxxxxxxx> >>> Signed-off-by: Javier Gonzalez <javier.gonz@xxxxxxxxxxx> >>> --- >>> fs/io_uring.c | 72 +++++++++++++++++++++++++++++++++++++++++-- >>> include/uapi/linux/io_uring.h | 8 ++++- >>> 2 files changed, 77 insertions(+), 3 deletions(-) >>> >>> diff --git a/fs/io_uring.c b/fs/io_uring.c >>> index 155f3d8..c14c873 100644 >>> --- a/fs/io_uring.c >>> +++ b/fs/io_uring.c >>> @@ -649,6 +649,10 @@ struct io_kiocb { >>> unsigned long fsize; >>> u64 user_data; >>> u32 result; >>> +#ifdef CONFIG_BLK_DEV_ZONED >>> + /* zone-relative offset for append, in bytes */ >>> + u32 append_offset; >> >> this can overflow. u64 is needed. > > We chose to do it this way to start with because struct io_uring_cqe > only has space for u32 when we reuse the flags. > > We can of course create a new cqe structure, but that will come with > larger changes to io_uring for supporting append. > > Do you believe this is a better approach? The problem is that zone size are 32 bits in the kernel, as a number of sectors. So any device that has a zone size smaller or equal to 2^31 512B sectors can be accepted. Using a zone relative offset in bytes for returning zone append result is OK-ish, but to match the kernel supported range of possible zone size, you need 31+9 bits... 32 does not cut it. Since you need a 64-bit sized result, I would also prefer that you drop the zone relative offset as a result and return the absolute offset instead. That makes life easier for the applications since the zone append requests also must use absolute offsets for zone start. An absolute offset as a result becomes consistent with that and all other read/write system calls that all use absolute offsets (seek() is the only one that I know of that can use a relative offset, but that is not an IO system call). > > Javier > -- Damien Le Moal Western Digital Research