Re: [PATCH bpf-next v2 1/2] bpf: tcp: prevent bpf_reserve_hdr_opt() from growing skb larger than MTU

Martin KaFai Lau <martin.lau@xxxxxxxxx> · Thu, 29 Aug 2024 17:20:07 -0700

On 8/29/24 9:46 AM, Cong Wang wrote:
On Wed, Aug 28, 2024 at 02:29:17PM -0700, Martin KaFai Lau wrote:
On 8/26/24 6:37 PM, zijianzhang@xxxxxxxxxxxxx wrote:
From: Amery Hung <amery.hung@xxxxxxxxxxxxx>

This series prevents sockops users from accidentally causing packet
drops. This can happen when a BPF_SOCK_OPS_HDR_OPT_LEN_CB program
reserves different option lengths in tcp_sendmsg().

Initially, sockops BPF_SOCK_OPS_HDR_OPT_LEN_CB program will be called to
reserve a space in tcp_send_mss(), which will return the MSS for TSO.
Then, BPF_SOCK_OPS_HDR_OPT_LEN_CB will be called in __tcp_transmit_skb()
again to calculate the actual tcp_option_size and skb_push() the total
header size.

skb->gso_size is restored from TCP_SKB_CB(skb)->tcp_gso_size, which is
derived from tcp_send_mss() where we first call HDR_OPT_LEN. If the
reserved opt size is smaller than the actual header size, the len of the
skb can exceed the MTU. As a result, ip(6)_fragment will drop the
packet if skb->ignore_df is not set.

To prevent this accidental packet drop, we need to make sure the
second call to the BPF_SOCK_OPS_HDR_OPT_LEN_CB program reserves space
not more than the first time.

iiuc, it is a bug in the bpf prog itself that did not reserve the same
header length and caused a drop. It is not the only drop case though for an
incorrect bpf prog. There are other cases where a bpf prog can accidentally
drop a packet.

But safety is the most important thing for eBPF programs, do we really
allow this kind of bug to happen in eBPF programs?

Do you have an actual use case that the bpf prog cannot reserve the correct
header length for the same sk ?

You can think of it as a simple call of bpf_get_prandom_u32():

SEC("sockops")
int bpf_sock_ops_cb(struct bpf_sock_ops *skops)
{
     if (skops->op == BPF_SOCK_OPS_HDR_OPT_LEN_CB) {
         return bpf_get_prandom_u32();

It compiles but this won't even pass the verifier.

If you want to go to this extreme to consider any random bpf program, it is 
essentially deploying a fuzzer to the production traffic, right? Sure, syzbot 
has programs that are rejected by verifier or cause a packet drop. afaik, I 
don't recall syzbot reported them as a crash.

Is a drop a safety issue as you said? I don't think so. It is why this set is 
tagged as bpf-next also and I agree. What is the hurry here?

Is it something that can be improved? may be. Note that the patchwork status has 
not been changed yet. Instead of wasting time here, please allow Zijian (/Amery) 
to continue the discussion on another thread. I have asked question on the 
changes and also suggested other ways.

     }
     return 0;
}

And eBPF programs are stateful anyway, at least we should not assume
it is stateless since maps are commonly used. Therefore, different invocations
of a same eBPF program are expected to return different values. IMHO,
this is a situation we have to deal with in the kernel, hence stricter
checks are reasonable and necessary.