On 11/4/22 8:32 AM, KP Singh wrote:
On Fri, Nov 4, 2022 at 4:27 PM Alexei Starovoitov <ast@xxxxxxxx> wrote:
On 11/4/22 5:13 AM, KP Singh wrote:
On Thu, Nov 3, 2022 at 11:17 PM Kumar Kartikeya Dwivedi
<memxor@xxxxxxxxx> wrote:
On Thu, Nov 03, 2022 at 12:51:18PM IST, Yonghong Song wrote:
A new bpf_type_flag MEM_RCU is added to indicate a PTR_TO_BTF_ID
object access needing rcu_read_lock protection. The rcu protection
is not needed for non-sleepable program. So various verification
checking is only done for sleepable programs. In particular, only
the following insns can be inside bpf_rcu_read_lock() region:
- any non call insns except BPF_ABS/BPF_IND
- non sleepable helpers and kfuncs.
Also, bpf_*_storage_get() helper's 5th hidden argument (for memory
allocation flag) should be GFP_ATOMIC.
If a pointer (PTR_TO_BTF_ID) is marked as rcu, then any use of
this pointer and the load which gets this pointer needs to be
protected by bpf_rcu_read_lock(). The following shows a couple
of examples:
struct task_struct {
...
struct task_struct __rcu *real_parent;
struct css_set __rcu *cgroups;
...
};
struct css_set {
...
struct cgroup *dfl_cgrp;
...
}
...
task = bpf_get_current_task_btf();
cgroups = task->cgroups;
dfl_cgroup = cgroups->dfl_cgrp;
... using dfl_cgroup ...
The bpf_rcu_read_lock/unlock() should be added like below to
avoid verification failures.
task = bpf_get_current_task_btf();
bpf_rcu_read_lock();
cgroups = task->cgroups;
dfl_cgroup = cgroups->dfl_cgrp;
bpf_rcu_read_unlock();
... using dfl_cgroup ...
The following is another example for task->real_parent.
task = bpf_get_current_task_btf();
bpf_rcu_read_lock();
real_parent = task->real_parent;
... bpf_task_storage_get(&map, real_parent, 0, 0);
bpf_rcu_read_unlock();
There is another case observed in selftest bpf_iter_ipv6_route.c:
struct fib6_info *rt = ctx->rt;
...
fib6_nh = &rt->fib6_nh[0]; // Not rcu protected
...
if (rt->nh)
fib6_nh = &nh->nh_info->fib6_nh; // rcu protected
...
... using fib6_nh ...
Currently verification will fail with
same insn cannot be used with different pointers
since the use of fib6_nh is tag with rcu in one path
but not in the other path. The above use case is a valid
one so the verifier is changed to ignore MEM_RCU type tag
in such cases.
Signed-off-by: Yonghong Song <yhs@xxxxxx>
---
include/linux/bpf.h | 3 +
include/linux/bpf_verifier.h | 1 +
kernel/bpf/btf.c | 11 +++
kernel/bpf/verifier.c | 126 ++++++++++++++++++++++++++++++++---
4 files changed, 133 insertions(+), 8 deletions(-)
[...]
+
This isn't right. Every load that obtains an RCU pointer needs to become tied to
the current RCU section, and needs to be invalidated once the RCU section ends.
So in addition to checking that bpf_rcu_read_lock is held around MEM_RCU access,
you need to invalidate all MEM_RCU pointers when bpf_rcu_read_unlock is called.
Otherwise, with the current logic, the following would become possible:
bpf_rcu_read_lock();
p = rcu_dereference(foo->rcup);
bpf_rcu_read_unlock();
// p is possibly dead
bpf_rcu_read_lock();
// use p
bpf_rcu_read_unlock();
What do want to do about cases like:
bpf_rcu_read_lock();
q = rcu_derference(foo->rcup);
bpf_rcu_read_lock();
This one should be rejected for simplicity.
Let's not complicated things with nested cs-s.
Agreed, the current logic tries to count the number of active
critical sections and the verifier should just reject if there
is a nested bpf_rcu_read_lock() call.
Okay, will not allow nested bpf_rcu_read_lock() for now.
p = rcu_derference(foo->rcup);
bpf_rcu_read_unlock();
// Use q
// Use p
bpf_rcu_read_unlock();
I think this is probably implied in your statement but just making it clear,
The invalidation needs to happen only when the outermost bpf_rcu_read_unlock
is called. i.e. when active_rcu_lock goes back down to 0.