From: Yonghong Song <yhs@xxxxxx> Date: Fri, 30 Jul 2021 00:09:08 -0700 > On 7/29/21 11:53 PM, Kuniyuki Iwashima wrote: > > From: Yonghong Song <yhs@xxxxxx> > > Date: Thu, 29 Jul 2021 23:24:41 -0700 > >> On 7/29/21 4:36 PM, Kuniyuki Iwashima wrote: > >>> This patch implements the BPF iterator for the UNIX domain socket. > >>> > >>> Currently, the batch optimization introduced for the TCP iterator in the > >>> commit 04c7820b776f ("bpf: tcp: Bpf iter batching and lock_sock") is not > >>> applied. It will require replacing the big lock for the hash table with > >>> small locks for each hash list not to block other processes. > >> > >> Thanks for the contribution. The patch looks okay except > >> missing seq_ops->stop implementation, see below for more explanation. > >> > >>> > >>> Signed-off-by: Kuniyuki Iwashima <kuniyu@xxxxxxxxxxxx> > >>> --- > >>> include/linux/btf_ids.h | 3 +- > >>> net/unix/af_unix.c | 78 +++++++++++++++++++++++++++++++++++++++++ > >>> 2 files changed, 80 insertions(+), 1 deletion(-) > >>> > >>> diff --git a/include/linux/btf_ids.h b/include/linux/btf_ids.h > >>> index 57890b357f85..bed4b9964581 100644 > >>> --- a/include/linux/btf_ids.h > >>> +++ b/include/linux/btf_ids.h > >>> @@ -172,7 +172,8 @@ extern struct btf_id_set name; > >>> BTF_SOCK_TYPE(BTF_SOCK_TYPE_TCP_TW, tcp_timewait_sock) \ > >>> BTF_SOCK_TYPE(BTF_SOCK_TYPE_TCP6, tcp6_sock) \ > >>> BTF_SOCK_TYPE(BTF_SOCK_TYPE_UDP, udp_sock) \ > >>> - BTF_SOCK_TYPE(BTF_SOCK_TYPE_UDP6, udp6_sock) > >>> + BTF_SOCK_TYPE(BTF_SOCK_TYPE_UDP6, udp6_sock) \ > >>> + BTF_SOCK_TYPE(BTF_SOCK_TYPE_UNIX, unix_sock) > >>> > >>> enum { > >>> #define BTF_SOCK_TYPE(name, str) name, > >>> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c > >>> index 89927678c0dc..d45ad87e3a49 100644 > >>> --- a/net/unix/af_unix.c > >>> +++ b/net/unix/af_unix.c > >>> @@ -113,6 +113,7 @@ > >>> #include <linux/security.h> > >>> #include <linux/freezer.h> > >>> #include <linux/file.h> > >>> +#include <linux/btf_ids.h> > >>> > >>> #include "scm.h" > >>> > >>> @@ -2935,6 +2936,49 @@ static const struct seq_operations unix_seq_ops = { > >>> .stop = unix_seq_stop, > >>> .show = unix_seq_show, > >>> }; > >>> + > >>> +#ifdef CONFIG_BPF_SYSCALL > >>> +struct bpf_iter__unix { > >>> + __bpf_md_ptr(struct bpf_iter_meta *, meta); > >>> + __bpf_md_ptr(struct unix_sock *, unix_sk); > >>> + uid_t uid __aligned(8); > >>> +}; > >>> + > >>> +static int unix_prog_seq_show(struct bpf_prog *prog, struct bpf_iter_meta *meta, > >>> + struct unix_sock *unix_sk, uid_t uid) > >>> +{ > >>> + struct bpf_iter__unix ctx; > >>> + > >>> + meta->seq_num--; /* skip SEQ_START_TOKEN */ > >>> + ctx.meta = meta; > >>> + ctx.unix_sk = unix_sk; > >>> + ctx.uid = uid; > >>> + return bpf_iter_run_prog(prog, &ctx); > >>> +} > >>> + > >>> +static int bpf_iter_unix_seq_show(struct seq_file *seq, void *v) > >>> +{ > >>> + struct bpf_iter_meta meta; > >>> + struct bpf_prog *prog; > >>> + struct sock *sk = v; > >>> + uid_t uid; > >>> + > >>> + if (v == SEQ_START_TOKEN) > >>> + return 0; > >>> + > >>> + uid = from_kuid_munged(seq_user_ns(seq), sock_i_uid(sk)); > >>> + meta.seq = seq; > >>> + prog = bpf_iter_get_info(&meta, false); > >>> + return unix_prog_seq_show(prog, &meta, v, uid); > >>> +} > >>> + > >>> +static const struct seq_operations bpf_iter_unix_seq_ops = { > >>> + .start = unix_seq_start, > >>> + .next = unix_seq_next, > >>> + .stop = unix_seq_stop, > >> > >> Although it is not required for /proc/net/unix, we should still > >> implement bpf_iter version of seq_ops->stop here. The main purpose > >> of bpf_iter specific seq_ops->stop is to call bpf program one > >> more time after ALL elements have been traversed. Such > >> functionality is implemented in all other bpf_iter variants. > > > > Thanks for your review! > > I will implement the extra call in the next spin. > > > > Just out of curiosity, is there a specific use case for the last call? > > We don't have use cases for dumps similar to /proc/net/... etc. > The original thinking is to permit in-kernel aggregation and the > seq_ops->stop() bpf program will have an indication as the last > bpf program invocation for the iterator at which point bpf program > may wrap up aggregation and send/signal the result to user space. > I am not sure whether people already used this feature or not, or > people may have different way to do that (e.g., from user space > directly checking map value if read() length is 0). But > bpf seq_ops->stop() provides an in-kernel way for bpf program > to respond to the end of iterating. Aggregation, that makes sense. Thank you!