On 6/19/20 8:59 AM, Pavel Begunkov wrote: > On 19/06/2020 17:15, Jens Axboe wrote: >> On 6/19/20 3:41 AM, javier.gonz@xxxxxxxxxxx wrote: >>> Jens, >>> >>> Would you have time to answer a question below in this thread? >>> >>> On 18.06.2020 11:11, javier.gonz@xxxxxxxxxxx wrote: >>>> On 18.06.2020 08:47, Damien Le Moal wrote: >>>>> On 2020/06/18 17:35, javier.gonz@xxxxxxxxxxx wrote: >>>>>> On 18.06.2020 07:39, Damien Le Moal wrote: >>>>>>> On 2020/06/18 2:27, Kanchan Joshi wrote: >>>>>>>> From: Selvakumar S <selvakuma.s1@xxxxxxxxxxx> >>>>>>>> >>>>>>>> Introduce three new opcodes for zone-append - >>>>>>>> >>>>>>>> IORING_OP_ZONE_APPEND : non-vectord, similiar to IORING_OP_WRITE >>>>>>>> IORING_OP_ZONE_APPENDV : vectored, similar to IORING_OP_WRITEV >>>>>>>> IORING_OP_ZONE_APPEND_FIXED : append using fixed-buffers >>>>>>>> >>>>>>>> Repurpose cqe->flags to return zone-relative offset. >>>>>>>> >>>>>>>> Signed-off-by: SelvaKumar S <selvakuma.s1@xxxxxxxxxxx> >>>>>>>> Signed-off-by: Kanchan Joshi <joshi.k@xxxxxxxxxxx> >>>>>>>> Signed-off-by: Nitesh Shetty <nj.shetty@xxxxxxxxxxx> >>>>>>>> Signed-off-by: Javier Gonzalez <javier.gonz@xxxxxxxxxxx> >>>>>>>> --- >>>>>>>> fs/io_uring.c | 72 +++++++++++++++++++++++++++++++++++++++++-- >>>>>>>> include/uapi/linux/io_uring.h | 8 ++++- >>>>>>>> 2 files changed, 77 insertions(+), 3 deletions(-) >>>>>>>> >>>>>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c >>>>>>>> index 155f3d8..c14c873 100644 >>>>>>>> --- a/fs/io_uring.c >>>>>>>> +++ b/fs/io_uring.c >>>>>>>> @@ -649,6 +649,10 @@ struct io_kiocb { >>>>>>>> unsigned long fsize; >>>>>>>> u64 user_data; >>>>>>>> u32 result; >>>>>>>> +#ifdef CONFIG_BLK_DEV_ZONED >>>>>>>> + /* zone-relative offset for append, in bytes */ >>>>>>>> + u32 append_offset; >>>>>>> >>>>>>> this can overflow. u64 is needed. >>>>>> >>>>>> We chose to do it this way to start with because struct io_uring_cqe >>>>>> only has space for u32 when we reuse the flags. >>>>>> >>>>>> We can of course create a new cqe structure, but that will come with >>>>>> larger changes to io_uring for supporting append. >>>>>> >>>>>> Do you believe this is a better approach? >>>>> >>>>> The problem is that zone size are 32 bits in the kernel, as a number >>>>> of sectors. So any device that has a zone size smaller or equal to >>>>> 2^31 512B sectors can be accepted. Using a zone relative offset in >>>>> bytes for returning zone append result is OK-ish, but to match the >>>>> kernel supported range of possible zone size, you need 31+9 bits... >>>>> 32 does not cut it. >>>> >>>> Agree. Our initial assumption was that u32 would cover current zone size >>>> requirements, but if this is a no-go, we will take the longer path. >>> >>> Converting to u64 will require a new version of io_uring_cqe, where we >>> extend at least 32 bits. I believe this will need a whole new allocation >>> and probably ioctl(). >>> >>> Is this an acceptable change for you? We will of course add support for >>> liburing when we agree on the right way to do this. >> >> If you need 64-bit of return value, then it's not going to work. Even >> with the existing patches, reusing cqe->flags isn't going to fly, as >> it would conflict with eg doing zone append writes with automatic >> buffer selection. > > Buffer selection is for reads/recv kind of requests, but appends > are writes. In theory they can co-exist using cqe->flags. Yeah good point, since it's just writes, doesn't matter. But the other point still stands, it could potentially conflict with other flags, but I guess only to the extent where both flags would need extra storage in ->flags. So not a huge concern imho. -- Jens Axboe