Re: [PATCH v5 bpf-next 07/12] bpf: tcp: Add bpf_skops_hdr_opt_len() and bpf_skops_write_hdr_opt()

Mat Martineau <mathew.j.martineau@xxxxxxxxxxxxxxx> · Thu, 20 Aug 2020 15:39:21 -0700 (PDT)

On Thu, 20 Aug 2020, Martin KaFai Lau wrote:

The bpf prog needs to parse the SYN header to learn what options have
been sent by the peer's bpf-prog before writing its options into SYNACK.
This patch adds a "syn_skb" arg to tcp_make_synack() and send_synack().
This syn_skb will eventually be made available (as read-only) to the
bpf prog.  This will be the only SYN packet available to the bpf
prog during syncookie.  For other regular cases, the bpf prog can
also use the saved_syn.

When writing options, the bpf prog will first be called to tell the
kernel its required number of bytes.  It is done by the new
bpf_skops_hdr_opt_len().  The bpf prog will only be called when the new
BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG is set in tp->bpf_sock_ops_cb_flags.
When the bpf prog returns, the kernel will know how many bytes are needed
and then update the "*remaining" arg accordingly.  4 byte alignment will
be included in the "*remaining" before this function returns.  The 4 byte
aligned number of bytes will also be stored into the opts->bpf_opt_len.
"bpf_opt_len" is a newly added member to the struct tcp_out_options.

Then the new bpf_skops_write_hdr_opt() will call the bpf prog to write the
header options.  The bpf prog is only called if it has reserved spaces
before (opts->bpf_opt_len > 0).

The bpf prog is the last one getting a chance to reserve header space
and writing the header option.

These two functions are half implemented to highlight the changes in
TCP stack.  The actual codes preparing the bpf running context and
invoking the bpf prog will be added in the later patch with other
necessary bpf pieces.

Reviewed-by: Eric Dumazet <edumazet@xxxxxxxxxx>
Signed-off-by: Martin KaFai Lau <kafai@xxxxxx>
---
include/net/tcp.h              |   6 +-
include/uapi/linux/bpf.h       |   3 +-
net/ipv4/tcp_input.c           |   5 +-
net/ipv4/tcp_ipv4.c            |   5 +-
net/ipv4/tcp_output.c          | 105 +++++++++++++++++++++++++++++----
net/ipv6/tcp_ipv6.c            |   5 +-
tools/include/uapi/linux/bpf.h |   3 +-
7 files changed, 109 insertions(+), 23 deletions(-)


...

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 5084333b5ab6..631a5ee0dd4e 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c

...

@@ -826,6 +886,15 @@ static unsigned int tcp_established_options(struct sock *sk, struct sk_buff *skb
			opts->num_sack_blocks * TCPOLEN_SACK_PERBLOCK;
	}

+	if (unlikely(BPF_SOCK_OPS_TEST_FLAG(tp,
+					    BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG))) {
+		unsigned int remaining = MAX_TCP_OPTION_SPACE - size;
+
+		bpf_skops_hdr_opt_len(sk, skb, NULL, NULL, 0, opts, &remaining);
+
+		size = MAX_TCP_OPTION_SPACE - remaining;
+	}
+
	return size;
}


Since bpf_skops_hdr_opt_len() is called after the SACK code tries to use 
up all remaining option space for SACK blocks, it's less likely that there 
will be sufficient space remaining. Did you consider moving this hunk 
before the SACK option space is allocated to give the bpf prog higher 
priority?

--
Mat Martineau
Intel