Re: "Cannot allocate memory" on ring creation (not RLIMIT_MEMLOCK)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Dec 23, 2020 at 4:38 PM Dmitry Kadashev <dkadashev@xxxxxxxxx> wrote:
>
> On Wed, Dec 23, 2020 at 3:39 PM Dmitry Kadashev <dkadashev@xxxxxxxxx> wrote:
> >
> > On Tue, Dec 22, 2020 at 11:37 PM Pavel Begunkov <asml.silence@xxxxxxxxx> wrote:
> > >
> > > On 22/12/2020 11:04, Dmitry Kadashev wrote:
> > > > On Tue, Dec 22, 2020 at 11:11 AM Pavel Begunkov <asml.silence@xxxxxxxxx> wrote:
> > > [...]
> > > >>> What about smaller rings? Can you check io_uring of what SQ size it can allocate?
> > > >>> That can be a different program, e.g. modify a bit liburing/test/nop.
> > > > Unfortunately I've rebooted the box I've used for tests yesterday, so I can't
> > > > try this there. Also I was not able to come up with an isolated reproducer for
> > > > this yet.
> > > >
> > > > The good news is I've found a relatively easy way to provoke this on a test VM
> > > > using our software. Our app runs with "admin" user perms (plus some
> > > > capabilities), it bumps RLIMIT_MEMLOCK to infinity on start. I've also created
> > > > an user called 'ioutest' to run the check for ring sizes using a different user.
> > > >
> > > > I've modified the test program slightly, to show the number of rings
> > > > successfully
> > > > created on each iteration and the actual error message (to debug a problem I was
> > > > having with it, but I've kept this after that). Here is the output:
> > > >
> > > > # sudo -u admin bash -c 'ulimit -a' | grep locked
> > > > max locked memory       (kbytes, -l) 1024
> > > >
> > > > # sudo -u ioutest bash -c 'ulimit -a' | grep locked
> > > > max locked memory       (kbytes, -l) 1024
> > > >
> > > > # sudo -u admin ./iou-test1
> > > > Failed after 0 rings with 1024 size: Cannot allocate memory
> > > > Failed after 0 rings with 512 size: Cannot allocate memory
> > > > Failed after 0 rings with 256 size: Cannot allocate memory
> > > > Failed after 0 rings with 128 size: Cannot allocate memory
> > > > Failed after 0 rings with 64 size: Cannot allocate memory
> > > > Failed after 0 rings with 32 size: Cannot allocate memory
> > > > Failed after 0 rings with 16 size: Cannot allocate memory
> > > > Failed after 0 rings with 8 size: Cannot allocate memory
> > > > Failed after 0 rings with 4 size: Cannot allocate memory
> > > > Failed after 0 rings with 2 size: Cannot allocate memory
> > > > can't allocate 1
> > > >
> > > > # sudo -u ioutest ./iou-test1
> > > > max size 1024
> > >
> > > Then we screw that specific user. Interestingly, if it has CAP_IPC_LOCK
> > > capability we don't even account locked memory.
> >
> > We do have some capabilities, but not CAP_IPC_LOCK. Ours are:
> >
> > CAP_NET_ADMIN, CAP_NET_BIND_SERVICE, CAP_SYS_RESOURCE, CAP_KILL,
> > CAP_DAC_READ_SEARCH.
> >
> > The latter was necessary for integration with some third-party thing that we do
> > not really use anymore, so we can try building without it, but it'd require some
> > time, mostly because I'm not sure how quickly I'd be able to provoke the issue.
> >
> > > btw, do you use registered buffers?
> >
> > No, we do not use neither registered buffers nor registered files (nor anything
> > else).
> >
> > Also, I just tried the test program on a real box (this time one instance of our
> > program is still running - can repeat the check with it dead, but I expect the
> > results to be pretty much the same, at least after a few more restarts). This
> > box runs 5.9.5.
> >
> > # sudo -u admin bash -c 'ulimit -l'
> > 1024
> >
> > # sudo -u admin ./iou-test1
> > Failed after 0 rings with 1024 size: Cannot allocate memory
> > Failed after 0 rings with 512 size: Cannot allocate memory
> > Failed after 0 rings with 256 size: Cannot allocate memory
> > Failed after 0 rings with 128 size: Cannot allocate memory
> > Failed after 0 rings with 64 size: Cannot allocate memory
> > Failed after 0 rings with 32 size: Cannot allocate memory
> > Failed after 0 rings with 16 size: Cannot allocate memory
> > Failed after 0 rings with 8 size: Cannot allocate memory
> > Failed after 0 rings with 4 size: Cannot allocate memory
> > Failed after 0 rings with 2 size: Cannot allocate memory
> > can't allocate 1
> >
> > # sudo -u dmitry bash -c 'ulimit -l'
> > 1024
> >
> > # sudo -u dmitry ./iou-test1
> > max size 1024
>
> Please ignore the results from the real box above (5.9.5). The memlock limit
> interfered with this, since our app was running in the background and it had a
> few rings running (most failed to be created, but not all). I'll try to make it
> fully stuck and repeat the test with the app dead.

I've experimented with the 5.9 live boxes that were showing signs of the problem
a bit more, and I'm not entirely sure they get stuck until reboot anymore.

I'm pretty sure it is the case with 5.6, but probably a bug was fixed since
then - the fact that 5.8 in particular had quite a few fixes that seemed
relevant is the reason we've tried 5.9 in the first place.

And on 5.9 we might be seeing fragmentation issues indeed. I shouldn't have been
mixing my kernel versions :) Also, I did not realize a ring of size=1024
requires 16 contiguous pages. We will experiment and observe a bit more, and
meanwhile let's consider the case closed. If the issue surfaces again I'll
update this thread.

Thanks a *lot* Pavel for helping to debug this issue.

And sorry for the false alarm / noise everyone.

-- 
Dmitry Kadashev



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux