On Tue, Aug 9, 2022 at 11:38 AM Hao Luo <haoluo@xxxxxxxxxx> wrote: > > On Tue, Aug 9, 2022 at 9:23 AM Alexei Starovoitov > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > On Mon, Aug 08, 2022 at 05:56:57PM -0700, Hao Luo wrote: > > > On Mon, Aug 8, 2022 at 5:19 PM Andrii Nakryiko > > > <andrii.nakryiko@xxxxxxxxx> wrote: > > > > > > > > On Fri, Aug 5, 2022 at 2:49 PM Hao Luo <haoluo@xxxxxxxxxx> wrote: > > > > > > > > > > Cgroup_iter is a type of bpf_iter. It walks over cgroups in four modes: > > > > > > > > > > - walking a cgroup's descendants in pre-order. > > > > > - walking a cgroup's descendants in post-order. > > > > > - walking a cgroup's ancestors. > > > > > - process only the given cgroup. > > > > > > [...] > > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > > > > > index 59a217ca2dfd..4d758b2e70d6 100644 > > > > > --- a/include/uapi/linux/bpf.h > > > > > +++ b/include/uapi/linux/bpf.h > > > > > @@ -87,10 +87,37 @@ struct bpf_cgroup_storage_key { > > > > > __u32 attach_type; /* program attach type (enum bpf_attach_type) */ > > > > > }; > > > > > > > > > > +enum bpf_iter_order { > > > > > + BPF_ITER_ORDER_DEFAULT = 0, /* default order. */ > > > > > > > > why is this default order necessary? It just adds confusion (I had to > > > > look up source code to know what is default order). I might have > > > > missed some discussion, so if there is some very good reason, then > > > > please document this in commit message. But I'd rather not do some > > > > magical default order instead. We can set 0 to mean invalid and error > > > > out, or just do SELF as the very first value (and if user forgot to > > > > specify more fancy mode, they hopefully will quickly discover this in > > > > their testing). > > > > > > > > > > PRE/POST/UP are tree-specific orders. SELF applies on all iters and > > > yields only a single object. How does task_iter express a non-self > > > order? By non-self, I mean something like "I don't care about the > > > order, just scan _all_ the objects". And this "don't care" order, IMO, > > > may be the common case. I don't think everyone cares about walking > > > order for tasks. The DEFAULT is intentionally put at the first value, > > > so that if users don't care about order, they don't have to specify > > > this field. > > > > > > If that sounds valid, maybe using "UNSPEC" instead of "DEFAULT" is better? > > > > I agree with Andrii. > > This: > > + if (order == BPF_ITER_ORDER_DEFAULT) > > + order = BPF_ITER_DESCENDANTS_PRE; > > > > looks like an arbitrary choice. > > imo > > BPF_ITER_DESCENDANTS_PRE = 0, > > would have been more obvious. No need to dig into definition of "default". > > > > UNSPEC = 0 > > is fine too if we want user to always be conscious about the order > > and the kernel will error if that field is not initialized. > > That would be my preference, since it will match the rest of uapi/bpf.h > > > > Sounds good. In the next version, will use > > enum bpf_iter_order { > BPF_ITER_ORDER_UNSPEC = 0, > BPF_ITER_SELF_ONLY, /* process only a single object. */ > BPF_ITER_DESCENDANTS_PRE, /* walk descendants in pre-order. */ > BPF_ITER_DESCENDANTS_POST, /* walk descendants in post-order. */ > BPF_ITER_ANCESTORS_UP, /* walk ancestors upward. */ > }; > Sigh, I find that having UNSPEC=0 and erroring out when seeing UNSPEC doesn't work. Basically, if we have a non-iter prog and a cgroup_iter prog written in the same source file, I can't use bpf_object__attach_skeleton to attach them. Because the default prog_attach_fn for iter initializes `order` to 0 (that is, UNSPEC), which is going to be rejected by the kernel. In order to make bpf_object__attach_skeleton work on cgroup_iter, I think I need to use the following enum bpf_iter_order { BPF_ITER_DESCENDANTS_PRE, /* walk descendants in pre-order. */ BPF_ITER_DESCENDANTS_POST, /* walk descendants in post-order. */ BPF_ITER_ANCESTORS_UP, /* walk ancestors upward. */ BPF_ITER_SELF_ONLY, /* process only a single object. */ }; So that when calling bpf_object__attach_skeleton() on cgroup_iter, a link can be generated and the generated link defaults to pre-order walk on the whole hierarchy. Is there a better solution? > and explicitly list the values acceptable by cgroup_iter, error out if > UNSPEC is detected. > > Also, following Andrii's comments, will change BPF_ITER_SELF to > BPF_ITER_SELF_ONLY, which does seem a little bit explicit in > comparison. > > > I applied the first 3 patches to ease respin. > > Thanks! This helps! > > > Thanks!