On Wed, Aug 28, 2024 at 02:29:17PM -0700, Martin KaFai Lau wrote: > On 8/26/24 6:37 PM, zijianzhang@xxxxxxxxxxxxx wrote: > > From: Amery Hung <amery.hung@xxxxxxxxxxxxx> > > > > This series prevents sockops users from accidentally causing packet > > drops. This can happen when a BPF_SOCK_OPS_HDR_OPT_LEN_CB program > > reserves different option lengths in tcp_sendmsg(). > > > > Initially, sockops BPF_SOCK_OPS_HDR_OPT_LEN_CB program will be called to > > reserve a space in tcp_send_mss(), which will return the MSS for TSO. > > Then, BPF_SOCK_OPS_HDR_OPT_LEN_CB will be called in __tcp_transmit_skb() > > again to calculate the actual tcp_option_size and skb_push() the total > > header size. > > > > skb->gso_size is restored from TCP_SKB_CB(skb)->tcp_gso_size, which is > > derived from tcp_send_mss() where we first call HDR_OPT_LEN. If the > > reserved opt size is smaller than the actual header size, the len of the > > skb can exceed the MTU. As a result, ip(6)_fragment will drop the > > packet if skb->ignore_df is not set. > > > > To prevent this accidental packet drop, we need to make sure the > > second call to the BPF_SOCK_OPS_HDR_OPT_LEN_CB program reserves space > > not more than the first time. > > iiuc, it is a bug in the bpf prog itself that did not reserve the same > header length and caused a drop. It is not the only drop case though for an > incorrect bpf prog. There are other cases where a bpf prog can accidentally > drop a packet. But safety is the most important thing for eBPF programs, do we really allow this kind of bug to happen in eBPF programs? > > Do you have an actual use case that the bpf prog cannot reserve the correct > header length for the same sk ? You can think of it as a simple call of bpf_get_prandom_u32(): SEC("sockops") int bpf_sock_ops_cb(struct bpf_sock_ops *skops) { if (skops->op == BPF_SOCK_OPS_HDR_OPT_LEN_CB) { return bpf_get_prandom_u32(); } return 0; } And eBPF programs are stateful anyway, at least we should not assume it is stateless since maps are commonly used. Therefore, different invocations of a same eBPF program are expected to return different values. IMHO, this is a situation we have to deal with in the kernel, hence stricter checks are reasonable and necessary. Thanks.