Re: "Cannot allocate memory" on ring creation (not RLIMIT_MEMLOCK)

Jens Axboe <axboe@xxxxxxxxx> · Sat, 19 Dec 2020 10:11:15 -0700

On 12/19/20 9:29 AM, Jens Axboe wrote:
> On 12/19/20 9:13 AM, Jens Axboe wrote:
>> On 12/18/20 7:49 PM, Josef wrote:
>>>> I'm happy to run _any_ reproducer, so please do let us know if you
>>>> manage to find something that I can run with netty. As long as it
>>>> includes instructions for exactly how to run it :-)
>>>
>>> cool :)  I just created a repo for that:
>>> https://github.com/1Jo1/netty-io_uring-kernel-debugging.git
>>>
>>> - install jdk 1.8
>>> - to run netty: ./mvnw compile exec:java
>>> -Dexec.mainClass="uring.netty.example.EchoUringServer"
>>> - to run the echo test: cargo run --release -- --address
>>> "127.0.0.1:2022" --number 200 --duration 20 --length 300
>>> (https://github.com/haraldh/rust_echo_bench.git)
>>> - process kill -9
>>>
>>> async flag is enabled and these operation are used: OP_READ,
>>> OP_WRITE, OP_POLL_ADD, OP_CLOSE, OP_ACCEPT
>>>
>>> (btw you can change the port in EchoUringServer.java)
>>
>> This is great! Not sure this is the same issue, but what I see here is
>> that we have leftover workers when the test is killed. This means the
>> rings aren't gone, and the memory isn't freed (and unaccounted), which
>> would ultimately lead to problems of course, similar to just an
>> accounting bug or race.
>>
>> The above _seems_ to be related to IOSQE_ASYNC. Trying to narrow it
>> down...
> 
> Further narrowed down, it seems to be related to IOSQE_ASYNC on the
> read requests. I'm guessing there are cases where we end up not
> canceling them on ring close, hence the ring stays active, etc.
> 
> If I just add a hack to clear IOSQE_ASYNC on IORING_OP_READ, then
> the test terminates fine on the kill -9.

And even more so, it's IOSQE_ASYNC on the IORING_OP_READ on an eventfd
file descriptor. You probably don't want/mean to do that as it's
pollable, I guess it's done because you just set it on all reads for the
test?

In any case, it should of course work. This is the leftover trace when
we should be exiting, but an io-wq worker is still trying to get data
from the eventfd:

$ sudo cat /proc/2148/stack
[<0>] eventfd_read+0x160/0x260
[<0>] io_iter_do_read+0x1b/0x40
[<0>] io_read+0xa5/0x320
[<0>] io_issue_sqe+0x23c/0xe80
[<0>] io_wq_submit_work+0x6e/0x1a0
[<0>] io_worker_handle_work+0x13d/0x4e0
[<0>] io_wqe_worker+0x2aa/0x360
[<0>] kthread+0x130/0x160
[<0>] ret_from_fork+0x1f/0x30

which will never finish at this point, it should have been canceled.

-- 
Jens Axboe