On Wed, Aug 24, 2022 at 12:50 PM Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> wrote: > > On Sat, 20 Aug 2022 at 01:01, Alexei Starovoitov > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > On Fri, Aug 19, 2022 at 3:56 PM Kumar Kartikeya Dwivedi > > <memxor@xxxxxxxxx> wrote: > > > > > > On Sat, 20 Aug 2022 at 00:43, Alexei Starovoitov > > > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > > > > > On Sat, Aug 20, 2022 at 12:21:46AM +0200, Kumar Kartikeya Dwivedi wrote: > > > > > On Fri, 19 Aug 2022 at 23:43, Alexei Starovoitov > > > > > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > > > > > > > > > From: Alexei Starovoitov <ast@xxxxxxxxxx> > > > > > > > > > > > > Use call_rcu_tasks_trace() to wait for sleepable progs to finish. > > > > > > Then use call_rcu() to wait for normal progs to finish > > > > > > and finally do free_one() on each element when freeing objects > > > > > > into global memory pool. > > > > > > > > > > > > Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx> > > > > > > --- > > > > > > > > > > I fear this can make OOM issues very easy to run into, because one > > > > > sleepable prog that sleeps for a long period of time can hold the > > > > > freeing of elements from another sleepable prog which either does not > > > > > sleep often or sleeps for a very short period of time, and has a high > > > > > update frequency. I'm mostly worried that unrelated sleepable programs > > > > > not even using the same map will begin to affect each other. > > > > > > > > 'sleep for long time'? sleepable bpf prog doesn't mean that they can sleep. > > > > sleepable progs can copy_from_user, but they're not allowed to waste time. > > > > > > It is certainly possible to waste time, but indirectly, not through > > > the BPF program itself. > > > > > > If you have userfaultfd enabled (for unpriv users), an unprivileged > > > user can trap a sleepable BPF prog (say LSM) using bpf_copy_from_user > > > for as long as it wants. A similar case can be done using FUSE, IIRC. > > > > > > You can then say it's a problem about unprivileged users being able to > > > use userfaultfd or FUSE, or we could think about fixing > > > bpf_copy_from_user to return -EFAULT for this case, but it is totally > > > possible right now for malicious userspace to extend the tasks trace > > > gp like this for minutes (or even longer) on a system where sleepable > > > BPF programs are using e.g. bpf_copy_from_user. > > > > Well in that sense userfaultfd can keep all sorts of things > > in the kernel from making progress. > > But nothing to do with OOM. > > There is still the max_entries limit. > > The amount of objects in waiting_for_gp is guaranteed to be less > > than full prealloc. > > My thinking was that once you hold the GP using uffd, we can assume > you will eventually hit a case where all such maps on the system have > their max_entries exhausted. So yes, it probably won't OOM, but it > would be bad regardless. > > I think this just begs instead that uffd (and even FUSE) should not be > available to untrusted processes on the system by default. Both are > used regularly to widen hard to hit race conditions in the kernel. > > But anyway, there's no easy way currently to guarantee the lifetime of > elements for the sleepable case while being as low overhead as trace > RCU, so it makes sense to go ahead with this. Right. We evaluated SRCU for sleepable and it had too much overhead. That's the reason rcu_tasks_trace was added and sleepable bpf progs is the only user so far. The point I'm arguing is that call_rcu_tasks_trace in this patch doesn't add mm concerns more than the existing call_rcu. There is CONFIG_PREEMPT_RCU and RT. uffd will cause similar issues in such configs too.