Re: [PATCH bpf-next 2/4] bpf: Introduce kptr_rcu.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 15, 2023 at 07:58:10AM CET, Alexei Starovoitov wrote:
> From: Alexei Starovoitov <ast@xxxxxxxxxx>
>
> The life time of certain kernel structures like 'struct cgroup' is protected by RCU.
> Hence it's safe to dereference them directly from kptr-s in bpf maps.
> The resulting pointer is PTR_TRUSTED and can be passed to kfuncs that expect KF_TRUSTED_ARGS.

But I thought PTR_TRUSTED was meant to ensure that the refcount is always > 0?
E.g. in [0] you said that kernel code should ensure refcount is held while
passing trusted pointer as tracepoint args. It's also clear from what functions
that operate on PTR_TRUSTED are doing. bpf_cgroup_acquire is doing css_get and
not css_tryget. Similarly bpf_cgroup_ancestor also calls cgroup_get which does
css_get instead of css_tryget.

  [0]: https://lore.kernel.org/bpf/CAADnVQJfj9mrFZ+mBfwh8Xba333B6EyHRMdb6DE4s6te_5_V_A@xxxxxxxxxxxxxx

And if we want to do RCU + css_tryget, we already have that in the form of
kptr_get.

I think we've had a similar discussion about this in
https://lore.kernel.org/bpf/20220216214405.tn7thpnvqkxuvwhd@xxxxxxxxxxxxxxxxxxxxxxxxxxxx,
where you advised against directly assuming pointers to RCU protected objects as
trusted where refcount could drop to 0. So then we went the kptr_get route,
because explicit RCU sections weren't available back then to load + inc_not_zero
directly (for sleepable programs).

> Derefrence of other kptr-s returns PTR_UNTRUSTED.
>
> For example:
> struct map_value {
>    struct cgroup __kptr_rcu *cgrp;
> };
>
> SEC("tp_btf/cgroup_mkdir")
> int BPF_PROG(test_cgrp_get_ancestors, struct cgroup *cgrp_arg, const char *path)
> {
>   struct cgroup *cg, *cg2;
>
>   cg = bpf_cgroup_acquire(cgrp_arg); // cg is PTR_TRUSTED and ref_obj_id > 0
>   bpf_kptr_xchg(&v->cgrp, cg);
>
>   cg2 = v->cgrp; // cg2 is PTR_TRUSTED | MEM_RCU. This is new feature introduced by this patch.
>
>   bpf_cgroup_ancestor(cg2, level); // safe to do. cg2 will not disappear
							^^ But it's percpu_ref
							can drop to zero, right?

> }
>
> Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx>
> ---
> [...]



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux