Re: [PATCH bpf v2 1/2] bpf: Derive source IP addr via bpf_*_fib_lookup()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for the review.

On Wed, Oct 4, 2023, at 10:09 PM, Martin KaFai Lau wrote:
> On 10/3/23 12:10 AM, Martynas Pumputis wrote:
>> Extend the bpf_fib_lookup() helper by making it to return the source
>> IPv4/IPv6 address if the BPF_FIB_LOOKUP_SET_SRC flag is set.
>> 
>> For example, the following snippet can be used to derive the desired
>> source IP address:
>> 
>>      struct bpf_fib_lookup p = { .ipv4_dst = ip4->daddr };
>> 
>>      ret = bpf_skb_fib_lookup(skb, p, sizeof(p),
>>              BPF_FIB_LOOKUP_SET_SRC | BPF_FIB_LOOKUP_SKIP_NEIGH);
>>      if (ret != BPF_FIB_LKUP_RET_SUCCESS)
>>          return TC_ACT_SHOT;
>> 
>>      /* the p.ipv4_src now contains the source address */
>> 
>> The inability to derive the proper source address may cause malfunctions
>> in BPF-based dataplanes for hosts containing netdevs with more than one
>> routable IP address or for multi-homed hosts.
>> 
>> For example, Cilium implements packet masquerading in BPF. If an
>> egressing netdev to which the Cilium's BPF prog is attached has
>> multiple IP addresses, then only one [hardcoded] IP address can be used for
>> masquerading. This breaks connectivity if any other IP address should have
>> been selected instead, for example, when a public and private addresses
>> are attached to the same egress interface.
>> 
>> The change was tested with Cilium [1].
>> 
>> Nikolay Aleksandrov helped to figure out the IPv6 addr selection.
>> 
>> [1]: https://github.com/cilium/cilium/pull/28283
>> 
>> Signed-off-by: Martynas Pumputis <m@xxxxxxxxx>
>> ---
>>   include/net/ipv6_stubs.h       |  5 +++++
>>   include/uapi/linux/bpf.h       |  9 +++++++++
>>   net/core/filter.c              | 19 ++++++++++++++++++-
>>   net/ipv6/af_inet6.c            |  1 +
>>   tools/include/uapi/linux/bpf.h | 10 ++++++++++
>>   5 files changed, 43 insertions(+), 1 deletion(-)
>> 
>> diff --git a/include/net/ipv6_stubs.h b/include/net/ipv6_stubs.h
>> index c48186bf4737..21da31e1dff5 100644
>> --- a/include/net/ipv6_stubs.h
>> +++ b/include/net/ipv6_stubs.h
>> @@ -85,6 +85,11 @@ struct ipv6_bpf_stub {
>>   			       sockptr_t optval, unsigned int optlen);
>>   	int (*ipv6_getsockopt)(struct sock *sk, int level, int optname,
>>   			       sockptr_t optval, sockptr_t optlen);
>> +	int (*ipv6_dev_get_saddr)(struct net *net,
>> +				  const struct net_device *dst_dev,
>> +				  const struct in6_addr *daddr,
>> +				  unsigned int prefs,
>> +				  struct in6_addr *saddr);
>>   };
>>   extern const struct ipv6_bpf_stub *ipv6_bpf_stub __read_mostly;
>>   
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 0448700890f7..a6bf686eecbc 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -3257,6 +3257,10 @@ union bpf_attr {
>>    *			and *params*->smac will not be set as output. A common
>>    *			use case is to call **bpf_redirect_neigh**\ () after
>>    *			doing **bpf_fib_lookup**\ ().
>> + *		**BPF_FIB_LOOKUP_SET_SRC**
>> + *			Derive and set source IP addr in *params*->ipv{4,6}_src
>> + *			for the nexthop. If the src addr cannot be derived,
>> + *			**BPF_FIB_LKUP_RET_NO_SRC_ADDR** is returned.
>>    *
>>    *		*ctx* is either **struct xdp_md** for XDP programs or
>>    *		**struct sk_buff** tc cls_act programs.
>> @@ -6953,6 +6957,7 @@ enum {
>>   	BPF_FIB_LOOKUP_OUTPUT  = (1U << 1),
>>   	BPF_FIB_LOOKUP_SKIP_NEIGH = (1U << 2),
>>   	BPF_FIB_LOOKUP_TBID    = (1U << 3),
>> +	BPF_FIB_LOOKUP_SET_SRC = (1U << 4),
>
> bikeshedding: Shorten it to BPF_FIB_LOOKUP_SRC?

SGTM.

>
>>   };
>>   
>>   enum {
>> @@ -6965,6 +6970,7 @@ enum {
>>   	BPF_FIB_LKUP_RET_UNSUPP_LWT,   /* fwd requires encapsulation */
>>   	BPF_FIB_LKUP_RET_NO_NEIGH,     /* no neighbor entry for nh */
>>   	BPF_FIB_LKUP_RET_FRAG_NEEDED,  /* fragmentation required to fwd */
>> +	BPF_FIB_LKUP_RET_NO_SRC_ADDR,  /* failed to derive IP src addr */
>>   };
>>   
>>   struct bpf_fib_lookup {
>> @@ -6999,6 +7005,9 @@ struct bpf_fib_lookup {
>>   		__u32	rt_metric;
>>   	};
>>   
>> +	/* input: source address to consider for lookup
>> +	 * output: source address result from lookup
>> +	 */
>>   	union {
>>   		__be32		ipv4_src;
>>   		__u32		ipv6_src[4];  /* in6_addr; network order */
>> diff --git a/net/core/filter.c b/net/core/filter.c
>> index a094694899c9..f3777ef1840b 100644
>> --- a/net/core/filter.c
>> +++ b/net/core/filter.c
>> @@ -5850,6 +5850,9 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
>>   	params->rt_metric = res.fi->fib_priority;
>>   	params->ifindex = dev->ifindex;
>>   
>> +	if (flags & BPF_FIB_LOOKUP_SET_SRC)
>> +		params->ipv4_src = fib_result_prefsrc(net, &res);
>
> Does it need to check 0 and return BPF_FIB_LKUP_RET_NO_SRC_ADDR for v4 also,
> or the fib_result_prefsrc does not fail?

>From inspecting the other usages of the function - it does not fail.

>
>> +
>>   	/* xdp and cls_bpf programs are run in RCU-bh so
>>   	 * rcu_read_lock_bh is not needed here
>>   	 */
>> @@ -5992,6 +5995,19 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
>>   	params->rt_metric = res.f6i->fib6_metric;
>>   	params->ifindex = dev->ifindex;
>>   
>> +	if (flags & BPF_FIB_LOOKUP_SET_SRC) {
>> +		if (res.f6i->fib6_prefsrc.plen) {
>> +			*(struct in6_addr *)params->ipv6_src = res.f6i->fib6_prefsrc.addr;
>> +		} else {
>> +			err = ipv6_bpf_stub->ipv6_dev_get_saddr(net, dev,
>> +								&fl6.daddr, 0,
>> +								(struct in6_addr *)
>> +								params->ipv6_src);
>> +			if (err)
>> +				return BPF_FIB_LKUP_RET_NO_SRC_ADDR;
>
> This error also implies BPF_FIB_LKUP_RET_NO_NEIGH. I don't have a clean way of 
> improving the API. May be others have some ideas.
>
> Considering dev has no saddr is probably (?) an unlikely case, it should be ok 
> to leave it as is but at least a comment in the uapi will be needed. Otherwise, 
> the bpf prog may use the 0 dmac as-is.

I expect that a user of the helper checks that err == 0 before using any of the output params.

>
> I feel the current bpf_ipv[46]_fib_lookup helper is doing many things 
> in one 
> function and then requires different BPF_FIB_LOOKUP_* bits to select 
> what/how to 
> do. In the future, it may be worth to consider breaking it into smaller 
> kfunc(s). e.g. the __ipv[46]_neigh_lookup could be in its own kfunc.
>

Yep, good idea. At least it seems that the neigh lookup could live in its own function.

>> +		}
>> +	}
>> +
>>   	if (flags & BPF_FIB_LOOKUP_SKIP_NEIGH)
>>   		goto set_fwd_params;
>>   
>> @@ -6010,7 +6026,8 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
>>   #endif
>>   
>>   #define BPF_FIB_LOOKUP_MASK (BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_OUTPUT | \
>> -			     BPF_FIB_LOOKUP_SKIP_NEIGH | BPF_FIB_LOOKUP_TBID)
>> +			     BPF_FIB_LOOKUP_SKIP_NEIGH | BPF_FIB_LOOKUP_TBID | \
>> +			     BPF_FIB_LOOKUP_SET_SRC)
>>   
>>   BPF_CALL_4(bpf_xdp_fib_lookup, struct xdp_buff *, ctx,
>>   	   struct bpf_fib_lookup *, params, int, plen, u32, flags)




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux