On Fri, Jul 31, 2020 at 09:06:57AM -0700, Eric Dumazet wrote: > On Thu, Jul 30, 2020 at 1:57 PM Martin KaFai Lau <kafai@xxxxxx> wrote: > > > > The earlier effort in BPF-TCP-CC allows the TCP Congestion Control > > algorithm to be written in BPF. It opens up opportunities to allow > > a faster turnaround time in testing/releasing new congestion control > > ideas to production environment. > > > > The same flexibility can be extended to writing TCP header option. > > It is not uncommon that people want to test new TCP header option > > to improve the TCP performance. Another use case is for data-center > > that has a more controlled environment and has more flexibility in > > putting header options for internal only use. > > > > For example, we want to test the idea in putting maximum delay > > ACK in TCP header option which is similar to a draft RFC proposal [1]. > > > > This patch introduces the necessary BPF API and use them in the > > TCP stack to allow BPF_PROG_TYPE_SOCK_OPS program to parse > > and write TCP header options. It currently supports most of > > the TCP packet except RST. > > > > Supported TCP header option: > > ─────────────────────────── > > This patch allows the bpf-prog to write any option kind. > > Different bpf-progs can write its own option by calling the new helper > > bpf_store_hdr_opt(). The helper will ensure there is no duplicated > > option in the header. > > > > By allowing bpf-prog to write any option kind, this gives a lot of > > flexibility to the bpf-prog. Different bpf-prog can write its > > own option kind. It could also allow the bpf-prog to support a > > recently standardized option on an older kernel. > > > > Sockops Callback Flags: > > ────────────────────── > > The header parsing and writing callback can be turned on > > by enabling a few newly added callback flags: > > > > BPF_SOCK_OPS_PARSE_UNKNOWN_HDR_OPT_CB_FLAG: > > Call bpf when kernel has received a header option that > > the kernel cannot handle. It is useful when the peer doesn't > > send bpf-options very often. > > > > The bpf-prog can inspect the received header by sock_ops->skb_data > > which covers the whole header (including the fixed fields like > > ports, flags...etc) or > > use the new bpf_load_hdr_opt() to search for a particular TCP > > header option. > > > > > > > > > > > [1]: draft-wang-tcpm-low-latency-opt-00 > > https://urldefense.proofpoint.com/v2/url?u=https-3A__tools.ietf.org_html_draft-2Dwang-2Dtcpm-2Dlow-2Dlatency-2Dopt-2D00&d=DwIFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=VQnoQ7LvghIj0gVEaiQSUw&m=Z-syoz304fodO8xPKCcJh0QYhXbb7_XVuRgTINFba2U&s=Ad66Zb5r0utWgnrB-QuDXBft6G1HXW2C_aBV9fTMxoo&e= > > > > Signed-off-by: Martin KaFai Lau <kafai@xxxxxx> > > --- > > include/linux/bpf-cgroup.h | 25 +++ > > include/linux/filter.h | 4 + > > include/net/tcp.h | 53 ++++- > > include/uapi/linux/bpf.h | 231 ++++++++++++++++++++- > > net/core/filter.c | 365 +++++++++++++++++++++++++++++++++ > > net/ipv4/tcp_fastopen.c | 2 +- > > net/ipv4/tcp_input.c | 86 +++++++- > > net/ipv4/tcp_ipv4.c | 3 +- > > net/ipv4/tcp_minisocks.c | 1 + > > net/ipv4/tcp_output.c | 194 ++++++++++++++++-- > > net/ipv6/tcp_ipv6.c | 3 +- > > tools/include/uapi/linux/bpf.h | 231 ++++++++++++++++++++- > > 12 files changed, 1171 insertions(+), 27 deletions(-) > > This is a truly gigantic patch. > > Could you split it in maybe two parts ? Yes. Most of the code changes in TCP are calling out the bpf prog to parse and write header. Thus, they are all in this one patch. I will put those callout changes (and a few func arg changes) in TCP to a separate patch but leave the bpf callout function empty. Then the next bpf specific patch will fill out those empty bpf callout functions. > > This way I could focus on the TCP changes, and let eBPF experts focus > on BPF changes. Thanks for the review!