On Wed, Apr 8, 2020 at 4:26 PM Yonghong Song <yhs@xxxxxx> wrote: > > Two helpers bpf_seq_printf and bpf_seq_write, are added for > writing data to the seq_file buffer. > > bpf_seq_printf supports common format string flag/width/type > fields so at least I can get identical results for > netlink and ipv6_route targets. > > For bpf_seq_printf, return value 1 specifically indicates > a write failure due to overflow in order to differentiate > the failure from format strings. > > For seq_file show, since the same object may be called > twice, some bpf_prog might be sensitive to this. With return > value indicating overflow happens the bpf program can > react differently. > > Signed-off-by: Yonghong Song <yhs@xxxxxx> > --- > include/uapi/linux/bpf.h | 18 +++- > kernel/trace/bpf_trace.c | 172 +++++++++++++++++++++++++++++++++ > scripts/bpf_helpers_doc.py | 2 + > tools/include/uapi/linux/bpf.h | 18 +++- > 4 files changed, 208 insertions(+), 2 deletions(-) > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index b51d56fc77f9..a245f0df53c4 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -3030,6 +3030,20 @@ union bpf_attr { > * * **-EOPNOTSUPP** Unsupported operation, for example a > * call from outside of TC ingress. > * * **-ESOCKTNOSUPPORT** Socket type not supported (reuseport). > + * > + * int bpf_seq_printf(struct seq_file *m, const char *fmt, u32 fmt_size, ...) > + * Description > + * seq_printf > + * Return > + * 0 if successful, or > + * 1 if failure due to buffer overflow, or > + * a negative value for format string related failures. This encoding feels a bit arbitrary, why not stick to normal error codes and return, for example, EAGAIN on overflow (or EOVERFLOW?..) > + * > + * int bpf_seq_write(struct seq_file *m, const void *data, u32 len) > + * Description > + * seq_write > + * Return > + * 0 if successful, non-zero otherwise. Especially given that bpf_seq_write will probably return <0 on the same error? > */ > #define __BPF_FUNC_MAPPER(FN) \ > FN(unspec), \ > @@ -3156,7 +3170,9 @@ union bpf_attr { > FN(xdp_output), \ > FN(get_netns_cookie), \ > FN(get_current_ancestor_cgroup_id), \ > - FN(sk_assign), > + FN(sk_assign), \ > + FN(seq_printf), \ > + FN(seq_write), > > /* integer value in 'imm' field of BPF_CALL instruction selects which helper > * function eBPF program intends to call > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > index ca1796747a77..e7d6ba7c9c51 100644 > --- a/kernel/trace/bpf_trace.c > +++ b/kernel/trace/bpf_trace.c > @@ -457,6 +457,174 @@ const struct bpf_func_proto *bpf_get_trace_printk_proto(void) > return &bpf_trace_printk_proto; > } > > +BPF_CALL_5(bpf_seq_printf, struct seq_file *, m, char *, fmt, u32, fmt_size, u64, arg1, > + u64, arg2) > +{ I honestly didn't dare to look at implementation below, but this limitation of only up to 2 arguments in bpf_seq_printf (arg1 and arg2) seem extremely limiting. It might be ok for bpf_printk, but not for more general and non-debugging bpf_seq_printf. How about instead of passing arguments as 4th and 5th argument, bpf_seq_printf would require passing a pointer to a long array, where each item corresponds to printf argument? So on BPF program side, one would have to do this, to printf 5 arguments; long __tmp_arr[] = { 123, pointer_to_str, some_input_int, some_input_long, 5 * arg_x }; return bpf_seq_printf(m, fmt, fmt_size, &__tmp_arr, sizeof(__tmp_arr)); And the bpf_seq_printf would know that 4th argument is a pointer to an array of size provided in 5th argument and process them accordingly. This would theoretically allow to have arbitrary number of arguments. This local array construction can be abstracted into macro, of course. Would something like this be possible? [...] > +/* Horrid workaround for getting va_list handling working with different > + * argument type combinations generically for 32 and 64 bit archs. > + */ > +#define __BPF_SP_EMIT() __BPF_ARG2_SP() > +#define __BPF_SP(...) \ > + seq_printf(m, fmt, ##__VA_ARGS__) > + > +#define __BPF_ARG1_SP(...) \ > + ((mod[0] == 2 || (mod[0] == 1 && __BITS_PER_LONG == 64)) \ > + ? __BPF_SP(arg1, ##__VA_ARGS__) \ > + : ((mod[0] == 1 || (mod[0] == 0 && __BITS_PER_LONG == 32)) \ > + ? __BPF_SP((long)arg1, ##__VA_ARGS__) \ > + : __BPF_SP((u32)arg1, ##__VA_ARGS__))) > + > +#define __BPF_ARG2_SP(...) \ > + ((mod[1] == 2 || (mod[1] == 1 && __BITS_PER_LONG == 64)) \ > + ? __BPF_ARG1_SP(arg2, ##__VA_ARGS__) \ > + : ((mod[1] == 1 || (mod[1] == 0 && __BITS_PER_LONG == 32)) \ > + ? __BPF_ARG1_SP((long)arg2, ##__VA_ARGS__) \ > + : __BPF_ARG1_SP((u32)arg2, ##__VA_ARGS__))) hm... wouldn't this make it impossible to print 64-bit numbers on 32-bit arches? It seems to be truncating to 32-bit unconditionally.... > + > + __BPF_SP_EMIT(); > + return seq_has_overflowed(m); > +} > + [...]