Re: "Cannot allocate memory" on ring creation (not RLIMIT_MEMLOCK)

Dmitry Kadashev <dkadashev@xxxxxxxxx> · Mon, 21 Dec 2020 17:49:47 +0700

On Mon, Dec 21, 2020 at 5:35 PM Dmitry Kadashev <dkadashev@xxxxxxxxx> wrote:
>
> On Sun, Dec 20, 2020 at 7:59 AM Pavel Begunkov <asml.silence@xxxxxxxxx> wrote:
> >
> > On 20/12/2020 00:25, Jens Axboe wrote:
> > > On 12/19/20 4:42 PM, Pavel Begunkov wrote:
> > >> On 19/12/2020 23:13, Jens Axboe wrote:
> > >>> On 12/19/20 2:54 PM, Jens Axboe wrote:
> > >>>> On 12/19/20 1:51 PM, Josef wrote:
> > >>>>>> And even more so, it's IOSQE_ASYNC on the IORING_OP_READ on an eventfd
> > >>>>>> file descriptor. You probably don't want/mean to do that as it's
> > >>>>>> pollable, I guess it's done because you just set it on all reads for the
> > >>>>>> test?
> > >>>>>
> > >>>>> yes exactly, eventfd fd is blocking, so it actually makes no sense to
> > >>>>> use IOSQE_ASYNC
> > >>>>
> > >>>> Right, and it's pollable too.
> > >>>>
> > >>>>> I just tested eventfd without the IOSQE_ASYNC flag, it seems to work
> > >>>>> in my tests, thanks a lot :)
> > >>>>>
> > >>>>>> In any case, it should of course work. This is the leftover trace when
> > >>>>>> we should be exiting, but an io-wq worker is still trying to get data
> > >>>>>> from the eventfd:
> > >>>>>
> > >>>>> interesting, btw what kind of tool do you use for kernel debugging?
> > >>>>
> > >>>> Just poking at it and thinking about it, no hidden magic I'm afraid...
> > >>>
> > >>> Josef, can you try with this added? Looks bigger than it is, most of it
> > >>> is just moving one function below another.
> > >>
> > >> Hmm, which kernel revision are you poking? Seems it doesn't match
> > >> io_uring-5.10, and for 5.11 io_uring_cancel_files() is never called with
> > >> NULL files.
> > >>
> > >> if (!files)
> > >>      __io_uring_cancel_task_requests(ctx, task);
> > >> else
> > >>      io_uring_cancel_files(ctx, task, files);
> > >
> > > Yeah, I think I messed up. If files == NULL, then the task is going away.
> > > So we should cancel all requests that match 'task', not just ones that
> > > match task && files.
> > >
> > > Not sure I have much more time to look into this before next week, but
> > > something like that.
> > >
> > > The problem case is the async worker being queued, long before the task
> > > is killed and the contexts go away. But from exit_files(), we're only
> > > concerned with canceling if we have inflight. Doesn't look right to me.
> >
> > In theory all that should be killed in io_ring_ctx_wait_and_kill(),
> > of course that's if the ring itself is closed.
> >
> > Guys, do you share rings between processes? Explicitly like sending
> > io_uring fd over a socket, or implicitly e.g. sharing fd tables
> > (threads), or cloning with copying fd tables (and so taking a ref
> > to a ring).
>
> We do not share rings between processes. Our rings are accessible from different
> threads (under locks), but nothing fancy.

Actually, I'm wrong about the locks part, forgot how it works. In our case it
works like this: a parent thread creates a ring, and passes it to a worker
thread, which does all of the work with it, no locks are involved. On
(clean) termination the parent notifies the worker, waits for it to exit and
then calls io_uring_queue_exit. Not sure if that counts as sharing rings between
the threads or not.

As I've mentioned in some other email, I'll try (again) to make a reproducer.

-- 
Dmitry Kadashev