On Sat, 20 Aug 2022 at 00:43, Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Sat, Aug 20, 2022 at 12:21:46AM +0200, Kumar Kartikeya Dwivedi wrote: > > On Fri, 19 Aug 2022 at 23:43, Alexei Starovoitov > > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > > > From: Alexei Starovoitov <ast@xxxxxxxxxx> > > > > > > Use call_rcu_tasks_trace() to wait for sleepable progs to finish. > > > Then use call_rcu() to wait for normal progs to finish > > > and finally do free_one() on each element when freeing objects > > > into global memory pool. > > > > > > Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx> > > > --- > > > > I fear this can make OOM issues very easy to run into, because one > > sleepable prog that sleeps for a long period of time can hold the > > freeing of elements from another sleepable prog which either does not > > sleep often or sleeps for a very short period of time, and has a high > > update frequency. I'm mostly worried that unrelated sleepable programs > > not even using the same map will begin to affect each other. > > 'sleep for long time'? sleepable bpf prog doesn't mean that they can sleep. > sleepable progs can copy_from_user, but they're not allowed to waste time. It is certainly possible to waste time, but indirectly, not through the BPF program itself. If you have userfaultfd enabled (for unpriv users), an unprivileged user can trap a sleepable BPF prog (say LSM) using bpf_copy_from_user for as long as it wants. A similar case can be done using FUSE, IIRC. You can then say it's a problem about unprivileged users being able to use userfaultfd or FUSE, or we could think about fixing bpf_copy_from_user to return -EFAULT for this case, but it is totally possible right now for malicious userspace to extend the tasks trace gp like this for minutes (or even longer) on a system where sleepable BPF programs are using e.g. bpf_copy_from_user. > I don't share OOM concerns at all. > max_entries and memcg limits are still there and enforced. > dynamic map is strictly better and memory efficient than full prealloc. > > > Have you considered other options? E.g. we could directly expose > > bpf_rcu_read_lock/bpf_rcu_read_unlock to the program and enforce that > > access to RCU protected map lookups only happens in such read > > sections, and unlock invalidates all RCU protected pointers? Sleepable > > helpers can then not be invoked inside the BPF RCU read section. The > > program uses RCU read section while accessing such maps, and sleeps > > after doing bpf_rcu_read_unlock. They can be kfuncs. > > Yes. We can add explicit bpf_rcu_read_lock and teach verifier about RCU CS, > but I don't see the value specifically for sleepable progs. > Current sleepable progs can do map lookup without extra kfuncs. > Explicit CS would force progs to be rewritten which is not great. > > > It might also be useful in general, to access RCU protected data from > > sleepable programs (i.e. make some sections of the program RCU > > protected and non-sleepable at runtime). It will allow use of elements > > For other cases, sure. We can introduce RCU protected objects and > explicit bpf_rcu_read_lock. > > > from dynamically allocated maps with bpf_mem_alloc while not having to > > wait for RCU tasks trace grace period, which can extend into minutes > > (or even longer if unlucky). > > sleepable bpf prog that lasts minutes? In what kind of situation? > We don't have bpf_sleep() helper and not going to add one any time soon.