Re: "Cannot allocate memory" on ring creation (not RLIMIT_MEMLOCK)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Dec 20, 2020 at 12:11 AM Jens Axboe <axboe@xxxxxxxxx> wrote:
>
> On 12/19/20 9:29 AM, Jens Axboe wrote:
> > On 12/19/20 9:13 AM, Jens Axboe wrote:
> >> On 12/18/20 7:49 PM, Josef wrote:
> >>>> I'm happy to run _any_ reproducer, so please do let us know if you
> >>>> manage to find something that I can run with netty. As long as it
> >>>> includes instructions for exactly how to run it :-)
> >>>
> >>> cool :)  I just created a repo for that:
> >>> https://github.com/1Jo1/netty-io_uring-kernel-debugging.git
> >>>
> >>> - install jdk 1.8
> >>> - to run netty: ./mvnw compile exec:java
> >>> -Dexec.mainClass="uring.netty.example.EchoUringServer"
> >>> - to run the echo test: cargo run --release -- --address
> >>> "127.0.0.1:2022" --number 200 --duration 20 --length 300
> >>> (https://github.com/haraldh/rust_echo_bench.git)
> >>> - process kill -9
> >>>
> >>> async flag is enabled and these operation are used: OP_READ,
> >>> OP_WRITE, OP_POLL_ADD, OP_CLOSE, OP_ACCEPT
> >>>
> >>> (btw you can change the port in EchoUringServer.java)
> >>
> >> This is great! Not sure this is the same issue, but what I see here is
> >> that we have leftover workers when the test is killed. This means the
> >> rings aren't gone, and the memory isn't freed (and unaccounted), which
> >> would ultimately lead to problems of course, similar to just an
> >> accounting bug or race.
> >>
> >> The above _seems_ to be related to IOSQE_ASYNC. Trying to narrow it
> >> down...
> >
> > Further narrowed down, it seems to be related to IOSQE_ASYNC on the
> > read requests. I'm guessing there are cases where we end up not
> > canceling them on ring close, hence the ring stays active, etc.
> >
> > If I just add a hack to clear IOSQE_ASYNC on IORING_OP_READ, then
> > the test terminates fine on the kill -9.
>
> And even more so, it's IOSQE_ASYNC on the IORING_OP_READ on an eventfd
> file descriptor.

In our case - unlike netty - we use io_uring only for disk IO, no eventfd. And
we do not use IOSQE_ASYNC (we've tried, but this coincided with some kernel
crashes, so we've disabled it for now - not 100% sure if it's related or not
yet).

I'll try (again) to build a simpler reproducer for our issue, which is probably
different from the netty one.

-- 
Dmitry Kadashev



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux