> On Wed, Jan 11, 2023 at 3:01 AM Ziyang Xuan > <william.xuanziyang@xxxxxxxxxx> wrote: >> >> Add ipip6 and ip6ip decap support for bpf_skb_adjust_room(). >> Main use case is for using cls_bpf on ingress hook to decapsulate >> IPv4 over IPv6 and IPv6 over IPv4 tunnel packets. >> >> Add two new flags BPF_F_ADJ_ROOM_DECAP_L3_IPV{4,6} to indicate the >> new IP header version after decapsulating the outer IP header. >> >> Suggested-by: Willem de Bruijn <willemb@xxxxxxxxxx> >> Signed-off-by: Ziyang Xuan <william.xuanziyang@xxxxxxxxxx> >> --- >> include/uapi/linux/bpf.h | 8 ++++++++ >> net/core/filter.c | 26 +++++++++++++++++++++++++- >> tools/include/uapi/linux/bpf.h | 8 ++++++++ >> 3 files changed, 41 insertions(+), 1 deletion(-) >> >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h >> index 464ca3f01fe7..dde1c2ea1c84 100644 >> --- a/include/uapi/linux/bpf.h >> +++ b/include/uapi/linux/bpf.h >> @@ -2644,6 +2644,12 @@ union bpf_attr { >> * Use with BPF_F_ADJ_ROOM_ENCAP_L2 flag to further specify the >> * L2 type as Ethernet. >> * >> + * * **BPF_F_ADJ_ROOM_DECAP_L3_IPV4**, >> + * **BPF_F_ADJ_ROOM_DECAP_L3_IPV6**: >> + * Indicate the new IP header version after decapsulating the >> + * outer IP header. Mainly used in scenarios that the inner and >> + * outer IP versions are different. >> + * > > Nit (only since I have another comment below) > > Indicate -> Set Sorry, I think "Indicate" maybe more suitable. Because the new IP header is original inner IP header, it's not be changed. The flags assist the kernel to better complete specific tasks. I think "Set" has a meaning of change. > [Mainly used .. that] -> [Used when] This looks good to me. Thanks! > >> if (skb_is_gso(skb)) { >> struct skb_shared_info *shinfo = skb_shinfo(skb); >> >> @@ -3596,6 +3609,10 @@ BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff, >> if (unlikely(proto != htons(ETH_P_IP) && >> proto != htons(ETH_P_IPV6))) >> return -ENOTSUPP; >> + if (unlikely((shrink && ((flags & BPF_F_ADJ_ROOM_DECAP_L3_MASK) == >> + BPF_F_ADJ_ROOM_DECAP_L3_MASK)) || (!shrink && >> + flags & BPF_F_ADJ_ROOM_DECAP_L3_MASK))) >> + return -EINVAL; >> >> off = skb_mac_header_len(skb); >> switch (mode) { >> @@ -3608,6 +3625,13 @@ BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff, >> return -ENOTSUPP; >> } >> >> + if (shrink) { >> + if (flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV6) >> + len_min = sizeof(struct ipv6hdr); >> + else if (flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV4) >> + len_min = sizeof(struct iphdr); >> + } >> + > > How about combining this branch with the above: > > if (flags & BPF_F_ADJ_ROOM_DECAP_L3_MASK) { > if (!shrink) > return -EINVAL; > > switch (flags & BPF_F_ADJ_ROOM_DECAP_L3_MASK) { > case BPF_F_ADJ_ROOM_DECAP_L3_IPV4: > len_min = sizeof(struct iphdr); > break; > case BPF_F_ADJ_ROOM_DECAP_L3_IPV6: > len_min = sizeof(struct ipv6hdr); > break; > default: > return -EINVAL; > } > This looks good to me. Thanks! >