Re: [PATCH v3 bpf-next 13/15] bpf: Prepare bpf_mem_alloc to be used by sleepable bpf programs.

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Fri, 19 Aug 2022 16:01:08 -0700

On Fri, Aug 19, 2022 at 3:56 PM Kumar Kartikeya Dwivedi
<memxor@xxxxxxxxx> wrote:
>
> On Sat, 20 Aug 2022 at 00:43, Alexei Starovoitov
> <alexei.starovoitov@xxxxxxxxx> wrote:
> >
> > On Sat, Aug 20, 2022 at 12:21:46AM +0200, Kumar Kartikeya Dwivedi wrote:
> > > On Fri, 19 Aug 2022 at 23:43, Alexei Starovoitov
> > > <alexei.starovoitov@xxxxxxxxx> wrote:
> > > >
> > > > From: Alexei Starovoitov <ast@xxxxxxxxxx>
> > > >
> > > > Use call_rcu_tasks_trace() to wait for sleepable progs to finish.
> > > > Then use call_rcu() to wait for normal progs to finish
> > > > and finally do free_one() on each element when freeing objects
> > > > into global memory pool.
> > > >
> > > > Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx>
> > > > ---
> > >
> > > I fear this can make OOM issues very easy to run into, because one
> > > sleepable prog that sleeps for a long period of time can hold the
> > > freeing of elements from another sleepable prog which either does not
> > > sleep often or sleeps for a very short period of time, and has a high
> > > update frequency. I'm mostly worried that unrelated sleepable programs
> > > not even using the same map will begin to affect each other.
> >
> > 'sleep for long time'? sleepable bpf prog doesn't mean that they can sleep.
> > sleepable progs can copy_from_user, but they're not allowed to waste time.
>
> It is certainly possible to waste time, but indirectly, not through
> the BPF program itself.
>
> If you have userfaultfd enabled (for unpriv users), an unprivileged
> user can trap a sleepable BPF prog (say LSM) using bpf_copy_from_user
> for as long as it wants. A similar case can be done using FUSE, IIRC.
>
> You can then say it's a problem about unprivileged users being able to
> use userfaultfd or FUSE, or we could think about fixing
> bpf_copy_from_user to return -EFAULT for this case, but it is totally
> possible right now for malicious userspace to extend the tasks trace
> gp like this for minutes (or even longer) on a system where sleepable
> BPF programs are using e.g. bpf_copy_from_user.

Well in that sense userfaultfd can keep all sorts of things
in the kernel from making progress.
But nothing to do with OOM.
There is still the max_entries limit.
The amount of objects in waiting_for_gp is guaranteed to be less
than full prealloc.