Re: IORING_OP_FILES_UPDATE fail with ECANCELED when IOSQE_IO_LINK is used

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 14 Jul 2020 at 08:51, Daniele Salvatore Albano
<d.albano@xxxxxxxxx> wrote:
>
> Sure thing, I will send a patch later on.
>
> Thanks!
>
> On Tue, 14 Jul 2020, 03:20 Jens Axboe, <axboe@xxxxxxxxx> wrote:
>>
>> On 7/13/20 4:55 PM, Daniele Salvatore Albano wrote:
>> > Hi everyone,
>> >
>> > I am trying to use IORING_OP_FILES_UPDATE  in combination with
>> > IOSQE_IO_LINK and IORING_OP_RECV / IORING_OP_RECV as linked op but I
>> > keep getting ECANCELED (errno 125).
>> >
>> > I am using io_uring (kernel 5.8.0-rc4 built 3 days ago) and liburing (tag 0.7).
>> >
>> > I went through the test cases and I wasn't able to find any
>> > combination of the OP and the flag and I can't find any related docs
>> > so I am not sure if the combo isn't allowed.
>> >
>> > Although I have found
>> > https://github.com/torvalds/linux/blob/v5.8-rc5/fs/io_uring.c#L4926
>> >
>> >  if (sqe->flags || sqe->ioprio || sqe->rw_flags)
>> >     return -EINVAL;
>> >
>> > Not sure if this is the reason for which the linked operation is
>> > failing, I don't see in the other *_prep sqe->flags being broadly
>> > checked in general.
>> >
>> > I wrote two simple test cases that perform the following sequence of operations:
>> > - open a local file (for the two test cases below /proc/cmdline)
>> > - IORING_OP_FILES_UPDATE +  IOSQE_IO_LINK (only in the first test case)
>> > - IORING_OP_READ + IOSQE_FIXED_FILE
>> >
>> > Here a small test case to trigger the issue I am facing
>> >
>> > int main() {
>> >     struct io_uring ring = {0};
>> >     uint32_t head, count = 0;
>> >     struct io_uring_sqe *sqe = NULL;
>> >     struct io_uring_cqe *cqe = NULL;
>> >     uint32_t files_map_count = 16;
>> >     const int *files_map_registered = malloc(sizeof(int) * files_map_count);
>> >     memset((void*)files_map_registered, 0, sizeof(int) * files_map_count);
>> >
>> >     io_uring_queue_init(16, &ring, 0);
>> >     io_uring_register_files(&ring, files_map_registered, files_map_count);
>> >
>> >     int fd = open("/proc/cmdline", O_RDONLY);
>> >     int fd_index = 10;
>> >
>> >     sqe = io_uring_get_sqe(&ring);
>> >     io_uring_prep_files_update(sqe, &fd, 1, fd_index);
>> >     io_uring_sqe_set_flags(sqe, IOSQE_IO_LINK);
>> >     sqe->user_data = 1;
>> >
>> >     char buffer[512] = {0};
>> >     sqe = io_uring_get_sqe(&ring);
>> >     io_uring_prep_read(sqe, fd_index, &buffer, sizeof(buffer), 0);
>> >     io_uring_sqe_set_flags(sqe, IOSQE_FIXED_FILE);
>> >     sqe->user_data = 2;
>> >
>> >     io_uring_submit_and_wait(&ring, 2);
>> >
>> >     io_uring_for_each_cqe(&ring, head, cqe) {
>> >         count++;
>> >
>> >         fprintf(stdout, "count = %d\n", count);
>> >         fprintf(stdout, "cqe->res = %d\n", cqe->res);
>> >         fprintf(stdout, "cqe->user_data = %llu\n", cqe->user_data);
>> >         fprintf(stdout, "cqe->flags = %u\n", cqe->flags);
>> >     }
>> >
>> >     io_uring_cq_advance(&ring, count);
>> >
>> >     io_uring_unregister_files(&ring);
>> >     io_uring_queue_exit(&ring);
>> >
>> >     return 0;
>> > }
>> >
>> > It will report for both the cqes res = -125
>> >
>> > Instead if the code doesn't link and wait for the read it works as I
>> > am expecting.
>> >
>> > int main() {
>> >     struct io_uring ring = {0};
>> >     uint32_t head, count = 0;
>> >     char buffer[512] = {0};
>> >     struct io_uring_sqe *sqe = NULL;
>> >     struct io_uring_cqe *cqe = NULL;
>> >     uint32_t files_map_count = 16;
>> >     const int *files_map_registered = malloc(sizeof(int) * files_map_count);
>> >     memset((void*)files_map_registered, 0, sizeof(int) * files_map_count);
>> >
>> >     io_uring_queue_init(16, &ring, 0);
>> >     io_uring_register_files(&ring, files_map_registered, files_map_count);
>> >
>> >     int fd = open("/proc/cmdline", O_RDONLY);
>> >     int fd_index = 10;
>> >
>> >     sqe = io_uring_get_sqe(&ring);
>> >     io_uring_prep_files_update(sqe, &fd, 1, fd_index);
>> >     io_uring_sqe_set_flags(sqe, 0);
>> >     sqe->user_data = 1;
>> >
>> >     int exit_loop = 0;
>> >     do {
>> >         io_uring_submit_and_wait(&ring, 1);
>> >
>> >         io_uring_for_each_cqe(&ring, head, cqe) {
>> >             count++;
>> >
>> >             fprintf(stdout, "count = %d\n", count);
>> >             fprintf(stdout, "cqe->res = %d\n", cqe->res);
>> >             fprintf(stdout, "cqe->user_data = %llu\n", cqe->user_data);
>> >             fprintf(stdout, "cqe->flags = %u\n", cqe->flags);
>> >
>> >             if (cqe->user_data == 1) {
>> >                 sqe = io_uring_get_sqe(&ring);
>> >                 io_uring_prep_read(sqe, fd_index, &buffer, sizeof(buffer), 0);
>> >                 io_uring_sqe_set_flags(sqe, IOSQE_FIXED_FILE);
>> >                 sqe->user_data = 2;
>> >             } else {
>> >                 if (cqe->res >= 0) {
>> >                     fprintf(stdout, "buffer = <");
>> >                     fwrite(buffer, cqe->res, 1, stdout);
>> >                     fprintf(stdout, ">\n");
>> >                 }
>> >
>> >                 exit_loop = 1;
>> >             }
>> >         }
>> >
>> >         io_uring_cq_advance(&ring, count);
>> >     } while(exit_loop == 0);
>> >
>> >     io_uring_unregister_files(&ring);
>> >     io_uring_queue_exit(&ring);
>> >
>> >     return 0;
>> > }
>> >
>> > The output here is
>> > count = 1
>> > cqe->res = 1
>> > cqe->user_data = 1
>> > cqe->flags = 0
>> > count = 2
>> > cqe->res = 58
>> > cqe->user_data = 2
>> > cqe->flags = 0
>> > buffer = <initrd=....yada yada yada...>
>> >
>> > Is this the expected behaviour? If no, any hint? What am I doing wrong?
>> >
>> > If the expected behaviour is that IORING_OP_FILES_UPDATE can't be
>> > linked, how can I use it properly to issue a read or another op that
>> > requires to work with fds in a chain of operations?
>>
>> I think the sqe->flags should just be removed, as you're alluding to.
>> Care to send a patch for that? Then I can queue it up. We can even mark
>> it stable to ensure it gets back to older kernels.
>>
>> Probably just want to make it something ala:
>>
>> unsigned flags = READ_ONCE(sqe->flags);
>>
>> if (flags & (IOSQE_FIXED_FILE | IOSQE_BUFFER_SELECT))
>>         return -EINVAL;
>>
>> as those flags don't make sense, but the rest do.
>>
>> --
>> Jens Axboe
>>

TLDR;
Change done but the linked op still need to do not use
IOSQE_FIXED_FILE and use the original fd
I don't think that this is a problem at all because it still allows
the approach "submit & forget" as long as the first subsequent io op
is using the original fd and it's not using the registered index.

I can imagine people may expect to be able to use IOSQE_FIXED_FILE
with the linked op after IORING_OP_FILES_UPDATE but documenting it may
be enough also because the additional work may slow down various
things.

******
I applied the change but it wasn't still working as I was expecting,
after further investigation I discovered that it was trying to read
from the stdin because in my test case fd_index is 0.

When in sequence the userland code submits
- IORING_OP_FILES_UPDATE + IOSQE_IO_LINK
- IORING_OP_READ + IOSQE_FIXED_FILE

On the kernel side I am seeing (almost other calls):
- io_files_update_prep
- io_req_set_file
- io_read_prep
- io_files_update
- io_read

When io_req_set_file is executed, before io_files_update, the
registered files at offset 10 contains 0 that is the stdin.

This happens because, to my understanding, io_req_set_file it is
invoked by io_init_req in io_submit_sqes that is directly invoked by
the io_uring_enter syscal or the sq_thread.

For this to work it would have to re-fetch the new file pointer and
update the proper structs in req.

Having IORING_OP_FILES_UPDATE working with IOSQE_IO_LINK is already
great because it's just possible to invoke it and forget, using the
normal fd for the operations linked directly and without having really
keep tracking of the original fd: most likely if you are submitting
the files update you are doing it to submit an io op right after as
well so you probably have easy access or may have easy access to the
original fd.

With a test program I am now able to:
- submit IORING_OP_FILES_UPDATE with IOSQE_IO_LINK
- submit IORING_OP_READ (without IOSQE_FIXED_FILE)
- on completion
- submit IORING_OP_READ with IOSQE_FIXED_FILE

To be honest, I am not sure it's worth addressing this on the kernel
side because although I can easily imagine the expected behaviour is
"I set IORING_OP_FILES_UPDATE so I can use IOSQE_FIXED_FILE even in
the chain" (or at least it was mine).

This would require a series of changes for a very small number of use
cases, especially because the FD should be reloaded for all the ops in
the chain meanwhile they are processed:
- it's necessary to access the original fd so this would have to be
tracked down somewhere in io_kiocb and because all the io ops that can
use IOSQE_FIXED_FILE would require to potentially reload the fd it
would have to be added to a number of places
- it's of course possible to re-use something else but I don't have
the necessary expertise in io_uring to understand what
- all the ops using IOSQE_FIXED_FILE would have to check for
IOSQE_FIXED_FILE_RELOADFD and in case reload the fd

As consequence:
- if anything else can't be reused to store the fd, potentially grow io_kiocb
- slow down the io ops when have to reload the fd
- having to trigger an additional if because it would make sense
adding unlikely, it's and edge case and should be in the slow path

At the end, having to reload the fd may slightly impact the
performances of all the io ops and at the same time slow down a lot
the few operations for which would be required to reload the fd so
would just easier (and better) to pass the original fd in the
initially chained ops and use the registered fd index after.
******

What do you think?


Thanks!

Daniele



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux