On Wed, Jun 17, 2020 at 5:21 PM Yonghong Song <yhs@xxxxxx> wrote: > > > On 6/17/20 3:51 PM, Andrii Nakryiko wrote: > > On Wed, Jun 17, 2020 at 2:16 PM Yonghong Song <yhs@xxxxxx> wrote: > > In my VM, I got identical result compared to /proc/net/{tcp,tcp6}. > For tcp6: > $ cat /proc/net/tcp6 > sl local_address remote_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode > 0: 00000000000000000000000000000000:0016 00000000000000000000000000000000:0000 0A 00000000:00000000 00:00000001 00000000 0 0 17955 1 000000003eb3102e 100 0 0 10 0 > > $ cat /sys/fs/bpf/p1 > sl local_address remote_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode > 0: 00000000000000000000000000000000:0016 00000000000000000000000000000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 17955 1 000000003eb3102e 100 0 0 10 0 > > For tcp: > $ cat /proc/net/tcp > sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode > 0: 00000000:0016 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 2666 1 000000007152e43f 100 0 0 10 0 > $ cat /sys/fs/bpf/p2 > sl local_address remote_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode > 1: 00000000:0016 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 2666 1 000000007152e43f 100 0 0 10 0 > > Signed-off-by: Yonghong Song <yhs@xxxxxx> > --- > tools/testing/selftests/bpf/progs/bpf_iter.h | 15 + > .../selftests/bpf/progs/bpf_iter_tcp4.c | 261 +++++++++++++++++ > .../selftests/bpf/progs/bpf_iter_tcp6.c | 277 ++++++++++++++++++ > 3 files changed, 553 insertions(+) > create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_tcp4.c > create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_tcp6.c > > diff --git a/tools/testing/selftests/bpf/progs/bpf_iter.h b/tools/testing/selftests/bpf/progs/bpf_iter.h > index d8e6820e49e6..ab3ed904d391 100644 > --- a/tools/testing/selftests/bpf/progs/bpf_iter.h > +++ b/tools/testing/selftests/bpf/progs/bpf_iter.h > @@ -7,6 +7,8 @@ > #define bpf_iter__netlink bpf_iter__netlink___not_used > #define bpf_iter__task bpf_iter__task___not_used > #define bpf_iter__task_file bpf_iter__task_file___not_used > +#define bpf_iter__tcp bpf_iter__tcp___not_used > +#define tcp6_sock tcp6_sock___not_used > #include "vmlinux.h" > #undef bpf_iter_meta > #undef bpf_iter__bpf_map > @@ -14,6 +16,8 @@ > #undef bpf_iter__netlink > #undef bpf_iter__task > #undef bpf_iter__task_file > +#undef bpf_iter__tcp > +#undef tcp6_sock > > struct bpf_iter_meta { > struct seq_file *seq; > @@ -47,3 +51,14 @@ struct bpf_iter__bpf_map { > struct bpf_iter_meta *meta; > struct bpf_map *map; > } __attribute__((preserve_access_index)); > + > +struct bpf_iter__tcp { > + struct bpf_iter_meta *meta; > + struct sock_common *sk_common; > + uid_t uid; > +} __attribute__((preserve_access_index)); > + > +struct tcp6_sock { > + struct tcp_sock tcp; > + struct ipv6_pinfo inet6; > +} __attribute__((preserve_access_index)); > > Why redefining struct tcp6_sock? It seems it's been defined forever, > so should just come from vmlinux.h > > Unfortunately, it is not in vmlinux.h. In the kernel code, tcp6_sock > is referenced with sizeof() operator and ipv6_pinfo is referenced > through through (struct ipv6_pinfo *)(ptr + sizeof(struct tcp6_sock) - sizeof(struct ipv6_pinfo)). > I have a minor comment in patch #5 for this. Ah, I see, so it wasn't there, but the latest kernel will be there. Ok. BTW, there is this BTF_TYPE_EMIT() macro (used by struct_ops), it would probably be good to use it instead of ad-hoc type casting in patch #5, to preserve type info. It's more explicitly stating the intent. > > diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_tcp4.c b/tools/testing/selftests/bpf/progs/bpf_iter_tcp4.c > new file mode 100644 > index 000000000000..37008b914334 > --- /dev/null > +++ b/tools/testing/selftests/bpf/progs/bpf_iter_tcp4.c > @@ -0,0 +1,261 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* Copyright (c) 2020 Facebook */ > +#include "bpf_iter.h" > +#include <bpf/bpf_helpers.h> > +#include <bpf/bpf_tracing.h> > +#include <bpf/bpf_endian.h> > + > +char _license[] SEC("license") = "GPL"; > + > +#define sk_state __sk_common.skc_state > +#define sk_refcnt __sk_common.skc_refcnt > + > +#define inet_daddr sk.__sk_common.skc_daddr > +#define inet_rcv_saddr sk.__sk_common.skc_rcv_saddr > +#define inet_dport sk.__sk_common.skc_dport > + > +#define tw_daddr __tw_common.skc_daddr > +#define tw_rcv_saddr __tw_common.skc_rcv_saddr > +#define tw_dport __tw_common.skc_dport > +#define tw_refcnt __tw_common.skc_refcnt > + > +#define ir_loc_addr req.__req_common.skc_rcv_saddr > +#define ir_rmt_addr req.__req_common.skc_daddr > +#define ir_rmt_port req.__req_common.skc_dport > +#define ir_num req.__req_common.skc_num > + > +#define ICSK_TIME_RETRANS 1 > +#define ICSK_TIME_PROBE0 3 > +#define ICSK_TIME_LOSS_PROBE 5 > +#define ICSK_TIME_REO_TIMEOUT 6 > + > +#define TCP_INFINITE_SSTHRESH 0x7fffffff > +#define TCP_PINGPONG_THRESH 3 > + > +#define AF_INET 2 > > Seems like this is needed to do anything useful with sockets, right? > How about adding a new helper header (e.g., bpf_sock.h or bpf_net.h?) > so that everyone can benefit? > > I have been thinking of a bpf_iter_net.h since I have several > net related bpf programs and they share some common macros and > static inline functions. But I am short of action in this revision. > > Yes, I agree it is a good idea to have a helper under > tools/lib/bpf for common use. Maybe I can call it > bpf_tracing_net.h to imply the header is for tracing programs > which ties to kernel internals and its contents may, although unlikely, > change. bpf_iter_net.h is too iter-specific, while this is useful more broadly. bpf_tracing_net.h is fine, I guess, but appending "tracing" to everything that is intended for use on the BPF side sounds a bit like an overkill... But don't know, no strong feelings there. > > > + > +static int hlist_unhashed_lockless(const struct hlist_node *h) > +{ > + return !(h->pprev); > +} > + > +static int timer_pending(const struct timer_list * timer) > +{ > + return !hlist_unhashed_lockless(&timer->entry); > +} > + > > [...]