On 8/23/21 8:40 PM, Jens Axboe wrote: > On 8/23/21 1:13 PM, Josh Triplett wrote: >> On Sat, Aug 21, 2021 at 08:18:12PM -0600, Jens Axboe wrote: >>> On 8/21/21 9:52 AM, Pavel Begunkov wrote: >>>> Add an optional feature to open/accept directly into io_uring's fixed >>>> file table bypassing the normal file table. Same behaviour if as the >>>> snippet below, but in one operation: >>>> >>>> sqe = prep_[open,accept](...); >>>> cqe = submit_and_wait(sqe); >>>> io_uring_register_files_update(uring_idx, (fd = cqe->res)); >>>> close((fd = cqe->res)); >>>> >>>> The idea in pretty old, and was brough up and implemented a year ago >>>> by Josh Triplett, though haven't sought the light for some reasons. >>>> >>>> The behaviour is controlled by setting sqe->file_index, where 0 implies >>>> the old behaviour. If non-zero value is specified, then it will behave >>>> as described and place the file into a fixed file slot >>>> sqe->file_index - 1. A file table should be already created, the slot >>>> should be valid and empty, otherwise the operation will fail. >>>> >>>> we can't use IOSQE_FIXED_FILE to switch between modes, because accept >>>> takes a file, and it already uses the flag with a different meaning. >>>> >>>> since RFC: >>>> - added attribution >>>> - updated descriptions >>>> - rebased >>>> >>>> since v1: >>>> - EBADF if slot is already used (Josh Triplett) >>>> - alias index with splice_fd_in (Josh Triplett) >>>> - fix a bound check bug >>> >>> With the prep series, this looks good to me now. Josh, what do you >>> think? >> >> I would still like to see this using a union with the `nofile` field in >> io_open and io_accept, rather than overloading the 16-bit buf_index >> field. That would avoid truncating to 16 bits, and make less work for >> expansion to more than 16 bits of fixed file indexes. >> >> (I'd also like that to actually use a union, rather than overloading the >> meaning of buf_index/nofile.) > > Agree, and in fact there's room in the open and accept command parts, so > we can just make it a separate entry there instead of using ->buf_index. > Then just pass in the index to io_install_fixed_file() instead of having > it pull it from req->buf_index. That's internal details, can be expanded at wish in the future, if we'd ever need larger tables. ->buf_index already holds indexes to different resources just fine. Aliasing with nofile would rather be ugly, so the only option, as you mentioned, is to grab some space from open/accept structs, but don't see why we'd want it when there is a more convenient alternative. >> I personally still feel that using non-zero to signify index-plus-one is >> both error-prone and not as future-compatible. I think we could do >> better with no additional overhead. But I think the final call on that >> interface is up to you, Jens. Do you think it'd be worth spending a flag >> bit or using a different opcode, to get a cleaner interface? If you >> don't, then I'd be fine with seeing this go in with just the io_open and >> io_accept change. > > I'd be inclined to go the extra opcode route instead, as the flag only > really would make sense to requests that instantiate file descriptors. > For this particular case, we'd need 3 new opcodes for > openat/openat2/accept, which is probably a worthwhile expenditure. > > Pavel, what do you think? Switch to using a different opcode for the new > requests, and just grab some space in io_open and io_accept for the fd > and pass it in to install. I don't get it, why it's even called hackish? How that's anyhow better? To me the feature looks like a natural extension to the operations, just like a read can be tuned with flags, so and creating new opcodes seems a bit ugly, unnecessary taking space from opcodes and adding duplication (even if both versions call the same handler). First, why it's not future-compatible? It's a serious argument, but I don't see where it came from. Do I miss something? It's u32 now, and so will easily cover all indexes. SQE fields should always be zeroed, that's a rule, liburing follows it, and there would have been already lots of problems for users not honoring it. And there will be a helper hiding all the index conversions for convenience. void io_uring_prep_open_direct(sqe, index, ...) { io_uring_prep_open(sqe, ...); sqe->file_index = index + 1; } -- Pavel Begunkov