On 27-May 22:33, Alexei Starovoitov wrote: > From: Alexei Starovoitov <ast@xxxxxxxxxx> > > Introduce sleepable BPF programs that can request such property for themselves > via BPF_F_SLEEPABLE flag at program load time. In such case they will be able > to use helpers like bpf_copy_from_user() that might sleep. At present only > fentry/fexit/fmod_ret and lsm programs can request to be sleepable and only > when they are attached to kernel functions that are known to allow sleeping. > > The non-sleepable programs are relying on implicit rcu_read_lock() and > migrate_disable() to protect life time of programs, maps that they use and > per-cpu kernel structures used to pass info between bpf programs and the > kernel. The sleepable programs cannot be enclosed into rcu_read_lock(). > migrate_disable() maps to preempt_disable() in non-RT kernels, so the progs > should not be enclosed in migrate_disable() as well. Therefore bpf_srcu is used > to protect the life time of sleepable progs. > > There are many networking and tracing program types. In many cases the > 'struct bpf_prog *' pointer itself is rcu protected within some other kernel > data structure and the kernel code is using rcu_dereference() to load that > program pointer and call BPF_PROG_RUN() on it. All these cases are not touched. > Instead sleepable bpf programs are allowed with bpf trampoline only. The > program pointers are hard-coded into generated assembly of bpf trampoline and > synchronize_srcu(&bpf_srcu) is used to protect the life time of the program. > The same trampoline can hold both sleepable and non-sleepable progs. > > When bpf_srcu lock is held it means that some sleepable bpf program is running > from bpf trampoline. Those programs can use bpf arrays and preallocated hash/lru > maps. These map types are waiting on programs to complete via > synchronize_srcu(&bpf_srcu); > > Updates to trampoline now has to do synchronize_srcu + synchronize_rcu_tasks > to wait for sleepable progs to finish and for trampoline assembly to finish. > > In the future srcu will be replaced with upcoming rcu_trace. > That will complete the first step of introducing sleepable progs. > > After that dynamically allocated hash maps can be allowed. All map elements > would have to be srcu protected instead of normal rcu. > per-cpu maps will be allowed. Either via the following pattern: > void *elem = bpf_map_lookup_elem(map, key); > if (elem) { > // access elem > bpf_map_release_elem(map, elem); > } > where modified lookup() helper will do migrate_disable() and > new bpf_map_release_elem() will do corresponding migrate_enable(). > Or explicit bpf_migrate_disable/enable() helpers will be introduced. > > Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx> Thanks! This will be really helpful for LSM programs. Acked-by: KP Singh <kpsingh@xxxxxxxxxx> > --- > arch/x86/net/bpf_jit_comp.c | 36 +++++++++++++++------- > include/linux/bpf.h | 4 +++ > include/uapi/linux/bpf.h | 8 +++++ > kernel/bpf/arraymap.c | 5 +++ > kernel/bpf/hashtab.c | 19 ++++++++---- > kernel/bpf/syscall.c | 12 ++++++-- > kernel/bpf/trampoline.c | 33 +++++++++++++++++++- > kernel/bpf/verifier.c | 56 ++++++++++++++++++++++++++-------- > tools/include/uapi/linux/bpf.h | 8 +++++ > 9 files changed, 147 insertions(+), 34 deletions(-) [...] > + if (ret) > + verbose(env, "%s() is not modifiable\n", > + prog->aux->attach_func_name); > + } else if (prog->aux->sleepable && prog->type == BPF_PROG_TYPE_TRACING) { > + /* fentry/fexit progs can be sleepable only if they are > + * attached to ALLOW_ERROR_INJECTION or security_*() funcs. > + * LSM progs check that they are attached to bpf_lsm_*() funcs > + * which are sleepable too. I know of one LSM hook which is not sleepable and is executed in an RCU callback i.e. task_free. I don't think t's a problem to run under SRCU for that (I tried it and it does not cause any issues). We can add a blacklisting mechanism later for the sleepable flags or just the sleeping helpers (based on some of the work going on to whitelist functions for helper usage). - KP > + */ > + ret = check_attach_modify_return(prog, addr); > + if (ret) > + verbose(env, "%s is not sleepable\n", [...] > * two extensions: > * > -- > 2.23.0 >