Thanks for the review. On Wed, Oct 4, 2023, at 10:09 PM, Martin KaFai Lau wrote: > On 10/3/23 12:10 AM, Martynas Pumputis wrote: >> Extend the bpf_fib_lookup() helper by making it to return the source >> IPv4/IPv6 address if the BPF_FIB_LOOKUP_SET_SRC flag is set. >> >> For example, the following snippet can be used to derive the desired >> source IP address: >> >> struct bpf_fib_lookup p = { .ipv4_dst = ip4->daddr }; >> >> ret = bpf_skb_fib_lookup(skb, p, sizeof(p), >> BPF_FIB_LOOKUP_SET_SRC | BPF_FIB_LOOKUP_SKIP_NEIGH); >> if (ret != BPF_FIB_LKUP_RET_SUCCESS) >> return TC_ACT_SHOT; >> >> /* the p.ipv4_src now contains the source address */ >> >> The inability to derive the proper source address may cause malfunctions >> in BPF-based dataplanes for hosts containing netdevs with more than one >> routable IP address or for multi-homed hosts. >> >> For example, Cilium implements packet masquerading in BPF. If an >> egressing netdev to which the Cilium's BPF prog is attached has >> multiple IP addresses, then only one [hardcoded] IP address can be used for >> masquerading. This breaks connectivity if any other IP address should have >> been selected instead, for example, when a public and private addresses >> are attached to the same egress interface. >> >> The change was tested with Cilium [1]. >> >> Nikolay Aleksandrov helped to figure out the IPv6 addr selection. >> >> [1]: https://github.com/cilium/cilium/pull/28283 >> >> Signed-off-by: Martynas Pumputis <m@xxxxxxxxx> >> --- >> include/net/ipv6_stubs.h | 5 +++++ >> include/uapi/linux/bpf.h | 9 +++++++++ >> net/core/filter.c | 19 ++++++++++++++++++- >> net/ipv6/af_inet6.c | 1 + >> tools/include/uapi/linux/bpf.h | 10 ++++++++++ >> 5 files changed, 43 insertions(+), 1 deletion(-) >> >> diff --git a/include/net/ipv6_stubs.h b/include/net/ipv6_stubs.h >> index c48186bf4737..21da31e1dff5 100644 >> --- a/include/net/ipv6_stubs.h >> +++ b/include/net/ipv6_stubs.h >> @@ -85,6 +85,11 @@ struct ipv6_bpf_stub { >> sockptr_t optval, unsigned int optlen); >> int (*ipv6_getsockopt)(struct sock *sk, int level, int optname, >> sockptr_t optval, sockptr_t optlen); >> + int (*ipv6_dev_get_saddr)(struct net *net, >> + const struct net_device *dst_dev, >> + const struct in6_addr *daddr, >> + unsigned int prefs, >> + struct in6_addr *saddr); >> }; >> extern const struct ipv6_bpf_stub *ipv6_bpf_stub __read_mostly; >> >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h >> index 0448700890f7..a6bf686eecbc 100644 >> --- a/include/uapi/linux/bpf.h >> +++ b/include/uapi/linux/bpf.h >> @@ -3257,6 +3257,10 @@ union bpf_attr { >> * and *params*->smac will not be set as output. A common >> * use case is to call **bpf_redirect_neigh**\ () after >> * doing **bpf_fib_lookup**\ (). >> + * **BPF_FIB_LOOKUP_SET_SRC** >> + * Derive and set source IP addr in *params*->ipv{4,6}_src >> + * for the nexthop. If the src addr cannot be derived, >> + * **BPF_FIB_LKUP_RET_NO_SRC_ADDR** is returned. >> * >> * *ctx* is either **struct xdp_md** for XDP programs or >> * **struct sk_buff** tc cls_act programs. >> @@ -6953,6 +6957,7 @@ enum { >> BPF_FIB_LOOKUP_OUTPUT = (1U << 1), >> BPF_FIB_LOOKUP_SKIP_NEIGH = (1U << 2), >> BPF_FIB_LOOKUP_TBID = (1U << 3), >> + BPF_FIB_LOOKUP_SET_SRC = (1U << 4), > > bikeshedding: Shorten it to BPF_FIB_LOOKUP_SRC? SGTM. > >> }; >> >> enum { >> @@ -6965,6 +6970,7 @@ enum { >> BPF_FIB_LKUP_RET_UNSUPP_LWT, /* fwd requires encapsulation */ >> BPF_FIB_LKUP_RET_NO_NEIGH, /* no neighbor entry for nh */ >> BPF_FIB_LKUP_RET_FRAG_NEEDED, /* fragmentation required to fwd */ >> + BPF_FIB_LKUP_RET_NO_SRC_ADDR, /* failed to derive IP src addr */ >> }; >> >> struct bpf_fib_lookup { >> @@ -6999,6 +7005,9 @@ struct bpf_fib_lookup { >> __u32 rt_metric; >> }; >> >> + /* input: source address to consider for lookup >> + * output: source address result from lookup >> + */ >> union { >> __be32 ipv4_src; >> __u32 ipv6_src[4]; /* in6_addr; network order */ >> diff --git a/net/core/filter.c b/net/core/filter.c >> index a094694899c9..f3777ef1840b 100644 >> --- a/net/core/filter.c >> +++ b/net/core/filter.c >> @@ -5850,6 +5850,9 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params, >> params->rt_metric = res.fi->fib_priority; >> params->ifindex = dev->ifindex; >> >> + if (flags & BPF_FIB_LOOKUP_SET_SRC) >> + params->ipv4_src = fib_result_prefsrc(net, &res); > > Does it need to check 0 and return BPF_FIB_LKUP_RET_NO_SRC_ADDR for v4 also, > or the fib_result_prefsrc does not fail? >From inspecting the other usages of the function - it does not fail. > >> + >> /* xdp and cls_bpf programs are run in RCU-bh so >> * rcu_read_lock_bh is not needed here >> */ >> @@ -5992,6 +5995,19 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params, >> params->rt_metric = res.f6i->fib6_metric; >> params->ifindex = dev->ifindex; >> >> + if (flags & BPF_FIB_LOOKUP_SET_SRC) { >> + if (res.f6i->fib6_prefsrc.plen) { >> + *(struct in6_addr *)params->ipv6_src = res.f6i->fib6_prefsrc.addr; >> + } else { >> + err = ipv6_bpf_stub->ipv6_dev_get_saddr(net, dev, >> + &fl6.daddr, 0, >> + (struct in6_addr *) >> + params->ipv6_src); >> + if (err) >> + return BPF_FIB_LKUP_RET_NO_SRC_ADDR; > > This error also implies BPF_FIB_LKUP_RET_NO_NEIGH. I don't have a clean way of > improving the API. May be others have some ideas. > > Considering dev has no saddr is probably (?) an unlikely case, it should be ok > to leave it as is but at least a comment in the uapi will be needed. Otherwise, > the bpf prog may use the 0 dmac as-is. I expect that a user of the helper checks that err == 0 before using any of the output params. > > I feel the current bpf_ipv[46]_fib_lookup helper is doing many things > in one > function and then requires different BPF_FIB_LOOKUP_* bits to select > what/how to > do. In the future, it may be worth to consider breaking it into smaller > kfunc(s). e.g. the __ipv[46]_neigh_lookup could be in its own kfunc. > Yep, good idea. At least it seems that the neigh lookup could live in its own function. >> + } >> + } >> + >> if (flags & BPF_FIB_LOOKUP_SKIP_NEIGH) >> goto set_fwd_params; >> >> @@ -6010,7 +6026,8 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params, >> #endif >> >> #define BPF_FIB_LOOKUP_MASK (BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_OUTPUT | \ >> - BPF_FIB_LOOKUP_SKIP_NEIGH | BPF_FIB_LOOKUP_TBID) >> + BPF_FIB_LOOKUP_SKIP_NEIGH | BPF_FIB_LOOKUP_TBID | \ >> + BPF_FIB_LOOKUP_SET_SRC) >> >> BPF_CALL_4(bpf_xdp_fib_lookup, struct xdp_buff *, ctx, >> struct bpf_fib_lookup *, params, int, plen, u32, flags)