Re: io_uring_prep_openat_direct() and link/drain

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/29/22 12:31 PM, Miklos Szeredi wrote:
> On Tue, 29 Mar 2022 at 20:26, Jens Axboe <axboe@xxxxxxxxx> wrote:
>>
>> On 3/29/22 12:21 PM, Miklos Szeredi wrote:
>>> On Tue, 29 Mar 2022 at 19:04, Jens Axboe <axboe@xxxxxxxxx> wrote:
>>>>
>>>> On 3/29/22 10:08 AM, Jens Axboe wrote:
>>>>> On 3/29/22 7:20 AM, Miklos Szeredi wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I'm trying to read multiple files with io_uring and getting stuck,
>>>>>> because the link and drain flags don't seem to do what they are
>>>>>> documented to do.
>>>>>>
>>>>>> Kernel is v5.17 and liburing is compiled from the git tree at
>>>>>> 7a3a27b6a384 ("add tests for nonblocking accept sockets").
>>>>>>
>>>>>> Without those flags the attached example works some of the time, but
>>>>>> that's probably accidental since ordering is not ensured.
>>>>>>
>>>>>> Adding the drain or link flags make it even worse (fail in casese that
>>>>>> the unordered one didn't).
>>>>>>
>>>>>> What am I missing?
>>>>>
>>>>> I don't think you're missing anything, it looks like a bug. What you
>>>>> want here is:
>>>>>
>>>>> prep_open_direct(sqe);
>>>>> sqe->flags |= IOSQE_IO_LINK;
>>>>> ...
>>>>> prep_read(sqe);
>>>
>>> So with the below merge this works.   But if instead I do
>>>
>>> prep_open_direct(sqe);
>>>  ...
>>> prep_read(sqe);
>>> sqe->flags |= IOSQE_IO_DRAIN;
>>>
>>> than it doesn't.  Shouldn't drain have a stronger ordering guarantee than link?
>>
>> I didn't test that, but I bet it's running into the same kind of issue
>> wrt prep. Are you getting -EBADF? The drain will indeed ensure that
>> _execution_ doesn't start until the previous requests have completed,
>> but it's still prepared before.
>>
>> For your use case, IO_LINK is what you want and that must work.
>>
>> I'll check the drain case just in case, it may in fact work if you just
>> edit the code base you're running now and remove these two lines from
>> io_init_req():
>>
>> if (unlikely(!req->file)) {
>> -        if (!ctx->submit_state.link.head)
>> -                return -EBADF;
>>         req->result = fd;
>>         req->flags |= REQ_F_DEFERRED_FILE;
>> }
>>
>> to not make it dependent on link.head. Probably not a bad idea in
>> general, as the rest of the handlers have been audited for req->file
>> usage in prep.
> 
> Nope, that results in the following Oops:
> 
> BUG: kernel NULL pointer dereference, address: 0000000000000044
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0
> Oops: 0000 [#1] SMP PTI
> CPU: 3 PID: 1126 Comm: readfiles Not tainted
> 5.17.0-00065-g3287b182c9c3-dirty #623
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.15.0-29-g6a62e0cb0dfe-prebuilt.qemu.org 04/01/2014
> RIP: 0010:io_rw_init_file+0x15/0x170
> Code: 00 6d 22 82 0f 95 c0 83 c0 02 c3 66 2e 0f 1f 84 00 00 00 00 00
> 0f 1f 44 00 00 41 55 41 54 55 53 4c 8b 2f 4c 8b 67 58 8b 6f 20 <41> 23
> 75 44 0f 84 28 01 00 00 48 89 fb f6 47 44 01 0f 84 08 01 00
> RSP: 0018:ffffc9000108fba8 EFLAGS: 00010207
> RAX: 0000000000000001 RBX: ffff888103ddd688 RCX: ffffc9000108fc18
> RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff888103ddd600
> RBP: 0000000000000000 R08: ffffc9000108fbd8 R09: 00007ffffffff000
> R10: 0000000000020000 R11: 000056012e2ce2e0 R12: ffff88810276b800
> R13: 0000000000000000 R14: 0000000000000000 R15: ffff888103ddd600
> FS:  00007f9058d72580(0000) GS:ffff888237d80000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000044 CR3: 0000000100966004 CR4: 0000000000370ee0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  <TASK>
>  io_read+0x65/0x4d0
>  ? select_task_rq_fair+0x602/0xf20
>  ? newidle_balance.constprop.0+0x2ff/0x3a0
>  io_issue_sqe+0xd86/0x21a0
>  ? __schedule+0x228/0x610
>  ? timerqueue_del+0x2a/0x40
>  io_req_task_submit+0x26/0x100
>  tctx_task_work+0x172/0x4b0
>  task_work_run+0x5c/0x90
>  io_cqring_wait+0x48d/0x790
>  ? io_eventfd_put+0x20/0x20
>  __do_sys_io_uring_enter+0x28d/0x5e0
>  ? __cond_resched+0x16/0x40
>  ? task_work_run+0x61/0x90
>  do_syscall_64+0x3b/0x90
>  entry_SYSCALL_64_after_hwframe+0x44/0xae

Ah yes that makes sense, since I only worried the prep file part up for
links. Forgot about that... Let me test, I'll see if it's feasible to do
for drain and send you an incremental.

-- 
Jens Axboe




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux