On 17/11/2020 22:59, Bijan Mottahedeh wrote: > >>> Support readv/writev with fixed buffers, and introduce IOSQE_FIXED_BUFFER, >>> consistent with fixed files. >> >> I don't like it at all, see issues below. The actual implementation would >> be much uglier. >> >> I propose you to split the series and push separately. Your first 6 patches >> first, I don't have conceptual objections to them. Then registration sharing >> (I still need to look it up). And then we can return to this, if you're not >> yet convinced. > > Ok. The sharing patch is actually the highest priority for us so it'd be great to know if you think it's in the right direction. > > Should I submit them as they are or address your fixed_file_ref comments in Patch 4/8 as well? Would I need your prep patch beforehand? Ok, there are 2 new patches in 5.10, you may wait for Jens to propagate it to 5.11 or just cherry-pick (no conflicts expected) them. On top apply ("io_uring: share fixed_file_refs b/w multiple rsrcs") to which I CC'ed you. It's simple enough so shouldn't be much problems with it. With that you need to call io_set_resource_node() every time you take a resource. That's it, _no_ extra ref_put for you to add in puts/free/etc. Also, consider that all ref_nodes of all types should be hooked into a single ->ref_list (e.g. file_data->ref_list). > >>> +static ssize_t io_import_iovec_fixed(int rw, struct io_kiocb *req, void *buf, >>> + unsigned segs, unsigned fast_segs, >>> + struct iovec **iovec, >>> + struct iov_iter *iter) >>> +{ >>> + struct io_ring_ctx *ctx = req->ctx; >>> + struct io_mapped_ubuf *imu; >>> + struct iovec *iov; >>> + u16 index, buf_index; >>> + ssize_t ret; >>> + unsigned long seg; >>> + >>> + if (unlikely(!ctx->buf_data)) >>> + return -EFAULT; >>> + >>> + ret = import_iovec(rw, buf, segs, fast_segs, iovec, iter); >> >> Did you test it? import_iovec() does access_ok() against each iov_base, >> which in your case are an index. > > I used liburing:test/file-{register,update} as models for the equivalent buffer tests and they seem to work. I can send out the tests and the liburing changes if you want. > > The fixed io test registers an empty iov table first: > > ret = io_uring_register_buffers(ring, iovs, UIO_MAXIOV); > > It next updates the table with two actual buffers at offset 768: > > ret = io_uring_register_buffers_update(ring, 768, ups, 2); > > It later uses the buffer at index 768 for writev similar to the file-register test (IOSQE_FIXED_BUFFER instead of IOSQE_FIXED_FILE): > > iovs[768].iov_base = (void *)768; > iovs[768].iov_len = pagesize; > > > io_uring_prep_writev(sqe, fd, iovs, 1, 0); > sqe->flags |= IOSQE_FIXED_BUFFER; > > ret = io_uring_submit(ring); > > Below is the iovec returned from > > io_import_iovec_fixed() > -> io_import_vec() > > {iov_base = 0x300 <dm_early_create+412>, iov_len = 4096} > >>> + if (ret < 0) >>> + return ret; >>> + >>> + iov = (struct iovec *)iter->iov; >>> + >>> + for (seg = 0; seg < iter->nr_segs; seg++) { >>> + buf_index = *(u16 *)(&iov[seg].iov_base); >> >> That's ugly, and also not consistent with rw_fixed, because iov_base is >> used to calculate offset. > > Would offset be applicable when using readv/writev? > > My thinkig was that for those cases, each iovec should be used exactly as registered. > >> >>> + if (unlikely(buf_index < 0 || buf_index >= ctx->nr_user_bufs)) >>> + return -EFAULT; >>> + >>> + index = array_index_nospec(buf_index, ctx->nr_user_bufs); >>> + imu = io_buf_from_index(ctx, index); >>> + if (!imu->ubuf || !imu->len) >>> + return -EFAULT; >>> + if (iov[seg].iov_len > imu->len) >>> + return -EFAULT; >>> + >>> + iov[seg].iov_base = (void *)imu->ubuf; >> >> Nope, that's not different from non registered version. >> What import_fixed actually do is setting up the iter argument to point >> to a bvec (a vector of struct page *). > > So in fact, the buffers end up being pinned again because they are not being as bvecs? > >> >> So it either would need to keep a vector of bvecs, that's a vector of vectors, >> that's not supported by iter, etc., so you'll also need to iterate over them >> in io_read/write and so on. Or flat 2D structure into 1D, but that's still ugly. > > So you're saying there's no clean way to create a readv/writev + fixed buffers API? It would've been nice to have a consistent API between files and buffers. > > >>> @@ -5692,7 +5743,7 @@ static int io_timeout_remove_prep(struct io_kiocb *req, >>> { >>> if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL)) >>> return -EINVAL; >>> - if (unlikely(req->flags & (REQ_F_FIXED_FILE | REQ_F_BUFFER_SELECT))) >>> + if (unlikely(req->flags & (REQ_F_FIXED_RSRC | REQ_F_BUFFER_SELECT))) >> >> Why it's here? >> >> #define REQ_F_FIXED_RSRC (REQ_F_FIXED_FILE | REQ_F_FIXED_BUFFER) >> So, why do you | with REQ_F_BUFFER_SELECT again here? > > I thought to group fixed files/buffers but distinguish them from selected buffers. > >>> @@ -87,6 +88,8 @@ enum { >>> #define IOSQE_ASYNC (1U << IOSQE_ASYNC_BIT) >>> /* select buffer from sqe->buf_group */ >>> #define IOSQE_BUFFER_SELECT (1U << IOSQE_BUFFER_SELECT_BIT) >>> +/* use fixed buffer set */ >>> +#define IOSQE_FIXED_BUFFER (1U << IOSQE_FIXED_BUFFER_BIT) >> >> Unfortenatuly, we're almost out of flags bits -- it's a 1 byte >> field and 6 bits are already taken. Let's not use it. -- Pavel Begunkov