On Tue, May 5, 2020 at 5:54 PM Alexei Starovoitov <ast@xxxxxx> wrote: > > On 5/5/20 5:14 PM, Yonghong Song wrote: > > > > > > On 5/5/20 2:30 PM, Andrii Nakryiko wrote: > >> On Sun, May 3, 2020 at 11:26 PM Yonghong Song <yhs@xxxxxx> wrote: > >>> > >>> Given a bpf program, the step to create an anonymous bpf iterator is: > >>> - create a bpf_iter_link, which combines bpf program and the target. > >>> In the future, there could be more information recorded in the > >>> link. > >>> A link_fd will be returned to the user space. > >>> - create an anonymous bpf iterator with the given link_fd. > >>> > >>> The bpf_iter_link can be pinned to bpffs mount file system to > >>> create a file based bpf iterator as well. > >>> > >>> The benefit to use of bpf_iter_link: > >>> - using bpf link simplifies design and implementation as bpf link > >>> is used for other tracing bpf programs. > >>> - for file based bpf iterator, bpf_iter_link provides a standard > >>> way to replace underlying bpf programs. > >>> - for both anonymous and free based iterators, bpf link query > >>> capability can be leveraged. > >>> > >>> The patch added support of tracing/iter programs for BPF_LINK_CREATE. > >>> A new link type BPF_LINK_TYPE_ITER is added to facilitate link > >>> querying. Currently, only prog_id is needed, so there is no > >>> additional in-kernel show_fdinfo() and fill_link_info() hook > >>> is needed for BPF_LINK_TYPE_ITER link. > >>> > >>> Signed-off-by: Yonghong Song <yhs@xxxxxx> > >>> --- > >> > >> LGTM. See small nit about __GFP_NOWARN. > >> > >> Acked-by: Andrii Nakryiko <andriin@xxxxxx> > >> > >> > >>> include/linux/bpf.h | 1 + > >>> include/linux/bpf_types.h | 1 + > >>> include/uapi/linux/bpf.h | 1 + > >>> kernel/bpf/bpf_iter.c | 62 ++++++++++++++++++++++++++++++++++ > >>> kernel/bpf/syscall.c | 14 ++++++++ > >>> tools/include/uapi/linux/bpf.h | 1 + > >>> 6 files changed, 80 insertions(+) > >>> > >> > >> [...] > >> > >>> +int bpf_iter_link_attach(const union bpf_attr *attr, struct bpf_prog > >>> *prog) > >>> +{ > >>> + struct bpf_link_primer link_primer; > >>> + struct bpf_iter_target_info *tinfo; > >>> + struct bpf_iter_link *link; > >>> + bool existed = false; > >>> + u32 prog_btf_id; > >>> + int err; > >>> + > >>> + if (attr->link_create.target_fd || attr->link_create.flags) > >>> + return -EINVAL; > >>> + > >>> + prog_btf_id = prog->aux->attach_btf_id; > >>> + mutex_lock(&targets_mutex); > >>> + list_for_each_entry(tinfo, &targets, list) { > >>> + if (tinfo->btf_id == prog_btf_id) { > >>> + existed = true; > >>> + break; > >>> + } > >>> + } > >>> + mutex_unlock(&targets_mutex); > >>> + if (!existed) > >>> + return -ENOENT; > >>> + > >>> + link = kzalloc(sizeof(*link), GFP_USER | __GFP_NOWARN); > >> > >> nit: all existing link implementation don't specify __GFP_NOWARN, > >> wonder if bpf_iter_link should be special? > > > > Nothing special. Just feel __GFP_NOWARN is the right thing to do to > > avoid pollute dmesg since -ENOMEM is returned to user space. But in > > reality, unlike some key/value allocation where the size could be huge > > and __GFP_NOWARN might be more useful, here, sizeof(*link) is fixed > > and small, __GFP_NOWARN probably not that useful. > > > > Will drop it. > > actually all existing user space driven allocation have nowarn. Can you define "user space driven"? I understand why for map, map key, map value, program we want to do that, because it's way too easy for user-space to specify huge sizes and allocation is proportional to that size. But in this case links are fixed-sized objects, same as struct file and struct inode. From BPF world, for instance, there is struct bpf_prog_list, which is created when user is attaching BPF program to cgroup, so it is user-space driven in similar sense. Yet we allocate it without __GFP_NOWARN. > If we missed it in other link allocs we should probably add it. Before bpf_link was formalized, raw_tracepoint_open was creating struct bpf_raw_tracepoint, without NOWARN.