Re: "Cannot allocate memory" on ring creation (not RLIMIT_MEMLOCK)

Norman Maurer <norman.maurer@xxxxxxxxxxxxxx> · Thu, 17 Dec 2020 09:26:58 +0100

I wonder if this is also related to one of the bug-reports we received:

https://github.com/netty/netty-incubator-transport-io_uring/issues/14

> On 17. Dec 2020, at 09:19, Dmitry Kadashev <dkadashev@xxxxxxxxx> wrote:
> 
> Hi,
> 
> We've ran into something that looks like a memory accounting problem in the
> kernel / io_uring code. We use multiple rings per process, and generally it
> works fine. Until it does not - new ring creation just fails with ENOMEM. And at
> that point it fails consistently until the box is rebooted.
> 
> More details: we use multiple rings per process, typically they are initialized
> on the process start (not necessarily, but that is not important here, let's
> just assume all are initialized on the process start). On a freshly booted box
> everything works fine. But after a while - and some process restarts -
> io_uring_queue_init() starts to fail with ENOMEM. Sometimes we see it fail, but
> then subsequent ones succeed (in the same process), but over time it gets worse,
> and eventually no ring can be initialized. And once that happens the only way to
> fix the problem is to restart the box.  Most of the mentioned restarts are
> graceful: a new process is started and then the old one is killed, possibly with
> the KILL signal if it does not shut down in time.  Things work fine for some
> time, but eventually we start getting those errors.
> 
> Originally we've used 5.6.6 kernel, but given the fact quite a few accounting
> issues were fixed in io_uring in 5.8, we've tried 5.9.5 as well, but the issue
> is not gone.
> 
> Just in case, everything else seems to be working fine, it just falls back to
> the thread pool instead of io_uring, and then everything continues to work just
> fine.
> 
> I was not able to spot anything suspicious in the /proc/meminfo. We have
> RLIMIT_MEMLOCK set to infinity. And on a box that currently experiences the
> problem /proc/meminfo shows just 24MB as locked.
> 
> Any pointers to how can we debug this?
> 
> Thanks,
> Dmitry