Re: "Cannot allocate memory" on ring creation (not RLIMIT_MEMLOCK)

Jens Axboe <axboe@xxxxxxxxx> · Fri, 18 Dec 2020 10:22:55 -0700

On 12/18/20 2:20 AM, Dmitry Kadashev wrote:
> On Thu, Dec 17, 2020 at 8:43 PM Victor Stewart <v@nametag.social> wrote:
>>
>> On Thu, Dec 17, 2020 at 11:12 AM Dmitry Kadashev <dkadashev@xxxxxxxxx> wrote:
>>>
>>> On Thu, Dec 17, 2020 at 5:38 PM Josef <josef.grieb@xxxxxxxxx> wrote:
>>>>
>>>>>> That is curious. This ticket mentions Shmem though, and in our case it does
>>>>  > not look suspicious at all. E.g. on a box that has the problem at the moment:
>>>>  > Shmem:  41856 kB. The box has 256GB of RAM.
>>>>  >
>>>>  > But I'd (given my lack of knowledge) expect the issues to be related anyway.
>>>>
>>>> what about mapped? mapped is pretty high 1GB on my machine, I'm still
>>>> reproduce that in C...however the user process is killed but not the
>>>> io_wq_worker kernel processes, that's also the reason why the server
>>>> socket still listening(even if the user process is killed), the bug
>>>> only occurs(in netty) with a high number of operations and using
>>>> eventfd_write to unblock io_uring_enter(IORING_ENTER_GETEVENTS)
>>>>
>>>> (tested on kernel 5.9 and 5.10)
>>>
>>> Stats from another box with this problem (still 256G of RAM):
>>>
>>> Mlocked:           17096 kB
>>> Mapped:           171480 kB
>>> Shmem:             41880 kB
>>>
>>> Does not look suspicious at a glance. Number of io_wq* processes is 23-31.
>>>
>>> Uptime is 27 days, 24 rings per process, process was restarted 4 times, 3 out of
>>> these four the old instance was killed with SIGKILL. On the last process start
>>> 18 rings failed to initialize, but after that 6 more were initialized
>>> successfully. It was before the old instance was killed. Maybe it's related to
>>> the load and number of io-wq processes, e.g. some of them exited and a few more
>>> rings were initialized successfully.
>>
>> have you tried using IORING_SETUP_ATTACH_WQ?
>>
>> https://lkml.org/lkml/2020/1/27/763
> 
> No, I have not, but while using that might help to slow down progression of the
> issue, it won't fix it - at least if I understand correctly. The problem is not
> that those rings can't be created at all - there is no problem with that on a
> freshly booted box, but rather that after some (potentially abrupt) owning
> process terminations under load kernel gets into a state where - eventually - no
> new rings can be created at all. Not a single one. In the above example the
> issue just haven't progressed far enough yet.
> 
> In other words, there seems to be a leak / accounting problem in the io_uring
> code that is triggered by abrupt process termination under load (just no
> io_uring_queue_exit?) - this is not a usage problem.

Right, I don't think that's related at all. Might be a good idea in general
depending on your use case, but it won't really have any bearing on the
particular issue at hand.

-- 
Jens Axboe