This is a followup of sleepable bpf_timer[0]. When discussing sleepable bpf_timer, it was thought that we should give a try to bpf_wq, as the 2 APIs are similar but distinct enough to justify a new one. So here it is. I tried to keep as much as possible common code in kernel/bpf/helpers.c but I couldn't get away with code duplication in kernel/bpf/verifier.c. This series introduces a basic bpf_wq support: - creation is supported - assignment is supported - running a simple bpf_wq is also supported. We will probably need to extend the API further with: - a full delayed_work API (can be piggy backed on top with a correct flag) - bpf_wq_cancel() - bpf_wq_cancel_sync() (for sleepable programs) - documentation But for now, let's focus on what we currently have to see if it's worth it compared to sleepable bpf_timer. FWIW, I still have a couple of concerns with this implementation: - I'm explicitely declaring the async callback as sleepable or not (BPF_F_WQ_SLEEPABLE) through a flag. Is it really worth it? Or should I just consider that any wq is running in a sleepable context? - bpf_wq_work() access ->prog without protection, but I think this might be racing with bpf_wq_set_callback(): if we have the following: CPU 0 CPU 1 bpf_wq_set_callback() bpf_start() bpf_wq_work(): prog = cb->prog; bpf_wq_set_callback() cb->prog = prog; bpf_prog_put(prev) rcu_assign_ptr(cb->callback_fn, callback_fn); callback = READ_ONCE(w->cb.callback_fn); As I understand callback_fn is fine, prog might be, but we clearly have an inconstency between "prog" and "callback_fn" as they can come from 2 different bpf_wq_set_callback() calls. IMO we should protect this by the async->lock, but I'm not sure if it's OK or not. --- For reference, the use cases I have in mind: --- Basically, I need to be able to defer a HID-BPF program for the following reasons (from the aforementioned patch): 1. defer an event: Sometimes we receive an out of proximity event, but the device can not be trusted enough, and we need to ensure that we won't receive another one in the following n milliseconds. So we need to wait those n milliseconds, and eventually re-inject that event in the stack. 2. inject new events in reaction to one given event: We might want to transform one given event into several. This is the case for macro keys where a single key press is supposed to send a sequence of key presses. But this could also be used to patch a faulty behavior, if a device forgets to send a release event. 3. communicate with the device in reaction to one event: We might want to communicate back to the device after a given event. For example a device might send us an event saying that it came back from sleeping state and needs to be re-initialized. Currently we can achieve that by keeping a userspace program around, raise a bpf event, and let that userspace program inject the events and commands. However, we are just keeping that program alive as a daemon for just scheduling commands. There is no logic in it, so it doesn't really justify an actual userspace wakeup. So a kernel workqueue seems simpler to handle. bpf_timers are currently running in a soft IRQ context, this patch series implements a sleppable context for them. Cheers, Benjamin To: Alexei Starovoitov <ast@xxxxxxxxxx> To: Daniel Borkmann <daniel@xxxxxxxxxxxxx> To: Andrii Nakryiko <andrii@xxxxxxxxxx> To: Martin KaFai Lau <martin.lau@xxxxxxxxx> To: Eduard Zingerman <eddyz87@xxxxxxxxx> To: Song Liu <song@xxxxxxxxxx> To: Yonghong Song <yonghong.song@xxxxxxxxx> To: John Fastabend <john.fastabend@xxxxxxxxx> To: KP Singh <kpsingh@xxxxxxxxxx> To: Stanislav Fomichev <sdf@xxxxxxxxxx> To: Hao Luo <haoluo@xxxxxxxxxx> To: Jiri Olsa <jolsa@xxxxxxxxxx> To: Mykola Lysenko <mykolal@xxxxxx> To: Shuah Khan <shuah@xxxxxxxxxx> Cc: <bpf@xxxxxxxxxxxxxxx> Cc: <linux-kernel@xxxxxxxxxxxxxxx> Cc: <linux-kselftest@xxxxxxxxxxxxxxx> Signed-off-by: Benjamin Tissoires <bentiss@xxxxxxxxxx> [0] https://lore.kernel.org/all/20240408-hid-bpf-sleepable-v6-0-0499ddd91b94@xxxxxxxxxx/ --- Benjamin Tissoires (18): bpf: trampoline: export __bpf_prog_enter/exit_recur bpf: make timer data struct more generic bpf: replace bpf_timer_init with a generic helper bpf: replace bpf_timer_set_callback with a generic helper bpf: replace bpf_timer_cancel_and_free with a generic helper bpf: add support for bpf_wq user type tools: sync include/uapi/linux/bpf.h bpf: add support for KF_ARG_PTR_TO_WORKQUEUE bpf: allow struct bpf_wq to be embedded in arraymaps and hashmaps selftests/bpf: add bpf_wq tests bpf: wq: add bpf_wq_init tools: sync include/uapi/linux/bpf.h selftests/bpf: wq: add bpf_wq_init() checks bpf/verifier: add is_sleepable argument to push_callback_call bpf: wq: add bpf_wq_set_callback_impl selftests/bpf: add checks for bpf_wq_set_callback() bpf: add bpf_wq_start selftests/bpf: wq: add bpf_wq_start() checks include/linux/bpf.h | 17 +- include/linux/bpf_verifier.h | 1 + include/uapi/linux/bpf.h | 13 + kernel/bpf/arraymap.c | 18 +- kernel/bpf/btf.c | 17 + kernel/bpf/hashtab.c | 55 ++- kernel/bpf/helpers.c | 371 ++++++++++++++++----- kernel/bpf/syscall.c | 16 +- kernel/bpf/trampoline.c | 6 +- kernel/bpf/verifier.c | 195 ++++++++++- tools/include/uapi/linux/bpf.h | 13 + tools/testing/selftests/bpf/bpf_experimental.h | 7 + .../selftests/bpf/bpf_testmod/bpf_testmod.c | 5 + .../selftests/bpf/bpf_testmod/bpf_testmod_kfunc.h | 1 + tools/testing/selftests/bpf/prog_tests/wq.c | 41 +++ tools/testing/selftests/bpf/progs/wq.c | 192 +++++++++++ tools/testing/selftests/bpf/progs/wq_failures.c | 197 +++++++++++ 17 files changed, 1052 insertions(+), 113 deletions(-) --- base-commit: ffa6b26b4d8a0520b78636ca9373ab842cb3b1a8 change-id: 20240411-bpf_wq-fe24e8d24f5e Best regards, -- Benjamin Tissoires <bentiss@xxxxxxxxxx>