On 1/21/22 11:53 AM, Andrii Nakryiko wrote:
On Fri, Jan 21, 2022 at 11:31 AM Kenny Yu <kennyyu@xxxxxx> wrote:
This adds a helper for bpf programs to read the memory of other
tasks. This also adds the ability for bpf iterator programs to
be sleepable.
This changes `bpf_iter_run_prog` to use the appropriate synchronization for
sleepable bpf programs. With sleepable bpf iterator programs, we can no
longer use `rcu_read_lock()` and must use `rcu_read_lock_trace()` instead
to protect the bpf program.
As an example use case at Meta, we are using a bpf task iterator program
and this new helper to print C++ async stack traces for all threads of
a given process.
Signed-off-by: Kenny Yu <kennyyu@xxxxxx>
---
include/linux/bpf.h | 1 +
include/uapi/linux/bpf.h | 10 ++++++++++
kernel/bpf/bpf_iter.c | 20 ++++++++++++++-----
kernel/bpf/helpers.c | 35 ++++++++++++++++++++++++++++++++++
kernel/trace/bpf_trace.c | 2 ++
tools/include/uapi/linux/bpf.h | 10 ++++++++++
6 files changed, 73 insertions(+), 5 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 80e3387ea3af..5917883e528b 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2229,6 +2229,7 @@ extern const struct bpf_func_proto bpf_kallsyms_lookup_name_proto;
extern const struct bpf_func_proto bpf_find_vma_proto;
extern const struct bpf_func_proto bpf_loop_proto;
extern const struct bpf_func_proto bpf_strncmp_proto;
+extern const struct bpf_func_proto bpf_copy_from_user_task_proto;
const struct bpf_func_proto *tracing_prog_func_proto(
enum bpf_func_id func_id, const struct bpf_prog *prog);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index fe2272defcd9..d82d9423874d 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5049,6 +5049,15 @@ union bpf_attr {
* This helper is currently supported by cgroup programs only.
* Return
* 0 on success, or a negative error in case of failure.
+ *
+ * long bpf_copy_from_user_task(void *dst, u32 size, const void *user_ptr, struct task_struct *tsk, u64 flags)
+ * Description
+ * Read *size* bytes from user space address *user_ptr* in *tsk*'s
+ * address space, and stores the data in *dst*. *flags* is not
+ * used yet and is provided for future extensibility. This helper
+ * can only be used by sleepable programs.
"On error dst buffer is zeroed out."? This is an explicit guarantee.
+ * Return
+ * 0 on success, or a negative error in case of failure.
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -5239,6 +5248,7 @@ union bpf_attr {
FN(get_func_arg_cnt), \
FN(get_retval), \
FN(set_retval), \
+ FN(copy_from_user_task), \
/* */
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/bpf/bpf_iter.c b/kernel/bpf/bpf_iter.c
index b7aef5b3416d..110029ede71e 100644
--- a/kernel/bpf/bpf_iter.c
+++ b/kernel/bpf/bpf_iter.c
@@ -5,6 +5,7 @@
#include <linux/anon_inodes.h>
#include <linux/filter.h>
#include <linux/bpf.h>
+#include <linux/rcupdate_trace.h>
struct bpf_iter_target_info {
struct list_head list;
@@ -684,11 +685,20 @@ int bpf_iter_run_prog(struct bpf_prog *prog, void *ctx)
{
int ret;
- rcu_read_lock();
- migrate_disable();
- ret = bpf_prog_run(prog, ctx);
- migrate_enable();
- rcu_read_unlock();
+ if (prog->aux->sleepable) {
+ rcu_read_lock_trace();
+ migrate_disable();
+ might_fault();
+ ret = bpf_prog_run(prog, ctx);
+ migrate_enable();
+ rcu_read_unlock_trace();
+ } else {
+ rcu_read_lock();
+ migrate_disable();
+ ret = bpf_prog_run(prog, ctx);
+ migrate_enable();
+ rcu_read_unlock();
+ }
I think this sleepable bpf_iter change deserves its own patch. It has
nothing to do with bpf_copy_from_user_task() helper.
Without the above change, using bpf_copy_from_user_task() will trigger
rcu warning and may produce incorrect result. One option is to put
the above in a preparation patch before introducing
bpf_copy_from_user_task() so we won't have bisecting issues.
/* bpf program can only return 0 or 1:
* 0 : okay
[...]