Hi Daniel, Thanks for working on this! On 12/15/22 12:25 AM, Daniel Xu wrote: [...]
+#include <linux/bpf.h> +#include <linux/btf_ids.h> +#include <linux/ip.h> +#include <linux/filter.h> +#include <linux/netdevice.h> +#include <net/ip.h> +#include <net/sock.h> + +__diag_push(); +__diag_ignore_all("-Wmissing-prototypes", + "Global functions as their definitions will be in ip_fragment BTF"); + +/* bpf_ip_check_defrag - Defragment an ipv4 packet + * + * This helper takes an skb as input. If this skb successfully reassembles + * the original packet, the skb is updated to contain the original, reassembled + * packet. + * + * Otherwise (on error or incomplete reassembly), the input skb remains + * unmodified. + * + * Parameters: + * @ctx - Pointer to program context (skb) + * @netns - Child network namespace id. If value is a negative signed + * 32-bit integer, the netns of the device in the skb is used. + * + * Return: + * 0 on successfully reassembly or non-fragmented packet. Negative value on + * error or incomplete reassembly. + */ +int bpf_ip_check_defrag(struct __sk_buff *ctx, u64 netns)
small nit: for sk lookup helper we've used u32 netns_id, would be nice to have this consistent here as well.
+{ + struct sk_buff *skb = (struct sk_buff *)ctx; + struct sk_buff *skb_cpy, *skb_out; + struct net *caller_net; + struct net *net; + int mac_len; + void *mac; + + if (unlikely(!((s32)netns < 0 || netns <= S32_MAX))) + return -EINVAL; + + caller_net = skb->dev ? dev_net(skb->dev) : sock_net(skb->sk); + if ((s32)netns < 0) { + net = caller_net; + } else { + net = get_net_ns_by_id(caller_net, netns); + if (unlikely(!net)) + return -EINVAL; + } + + mac_len = skb->mac_len; + skb_cpy = skb_copy(skb, GFP_ATOMIC); + if (!skb_cpy) + return -ENOMEM;
Given slow path, this idea is expensive but okay. Maybe in future it could be lifted which might be a bigger lift to teach verifier that input ctx cannot be accessed anymore.. but then frags are very much discouraged either way and bpf_ip_check_defrag() might only apply in corner case situations (like DNS, etc).
+ skb_out = ip_check_defrag(net, skb_cpy, IP_DEFRAG_BPF); + if (IS_ERR(skb_out)) + return PTR_ERR(skb_out);
Looks like ip_check_defrag() can gracefully handle IPv6 packet. It will just return back skb_cpy pointer in that case. However, this brings me to my main complaint.. I don't think we should merge anything IPv4-related without also having IPv6 equivalent support, otherwise we're building up tech debt, so pls also add support for the latter.
+ skb_morph(skb, skb_out); + kfree_skb(skb_out); + + /* ip_check_defrag() does not maintain mac header, so push empty header + * in so prog sees the correct layout. The empty mac header will be + * later pulled from cls_bpf. + */ + mac = skb_push(skb, mac_len); + memset(mac, 0, mac_len); + bpf_compute_data_pointers(skb); + + return 0; +} +
Thanks, Daniel