Re: [PATCH bpf-next 1/2] bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/25/23 3:24 PM, Eric Dumazet wrote:
On Thu, May 25, 2023 at 10:19 AM Lorenz Bauer <lmb@xxxxxxxxxxxxx> wrote:

Currently the bpf_sk_assign helper in tc BPF context refuses SO_REUSEPORT
sockets. This means we can't use the helper to steer traffic to Envoy, which
configures SO_REUSEPORT on its sockets. In turn, we're blocked from removing
TPROXY from our setup.

The reason that bpf_sk_assign refuses such sockets is that the bpf_sk_lookup
helpers don't execute SK_REUSEPORT programs. Instead, one of the
reuseport sockets is selected by hash. This could cause dispatch to the
"wrong" socket:

     sk = bpf_sk_lookup_tcp(...) // select SO_REUSEPORT by hash
     bpf_sk_assign(skb, sk) // SK_REUSEPORT wasn't executed

Fixing this isn't as simple as invoking SK_REUSEPORT from the lookup
helpers unfortunately. In the tc context, L2 headers are at the start
of the skb, while SK_REUSEPORT expects L3 headers instead.

Instead, we execute the SK_REUSEPORT program when the assigned socket
is pulled out of the skb, further up the stack. This creates some
trickiness with regards to refcounting as bpf_sk_assign will put both
refcounted and RCU freed sockets in skb->sk. reuseport sockets are RCU
freed. We can infer that the sk_assigned socket is RCU freed if the
reuseport lookup succeeds, but convincing yourself of this fact isn't
straight forward. Therefore we defensively check refcounting on the
sk_assign sock even though it's probably not required in practice.

Fixes: 8e368dc ("bpf: Fix use of sk->sk_reuseport from sk_assign")
Fixes: cf7fbe6 ("bpf: Add socket assign support")
Co-developed-by: Daniel Borkmann <daniel@xxxxxxxxxxxxx>
Signed-off-by: Daniel Borkmann <daniel@xxxxxxxxxxxxx>
Signed-off-by: Lorenz Bauer <lmb@xxxxxxxxxxxxx>
Cc: Joe Stringer <joe@xxxxxxxxx>
Link: https://lore.kernel.org/bpf/CACAyw98+qycmpQzKupquhkxbvWK4OFyDuuLMBNROnfWMZxUWeA@xxxxxxxxxxxxxx/
---
  include/net/inet6_hashtables.h | 36 +++++++++++++++++++++++++++++-----
  include/net/inet_hashtables.h  | 27 +++++++++++++++++++++++--
  include/net/sock.h             |  7 +++++--
  include/uapi/linux/bpf.h       |  3 ---
  net/core/filter.c              |  2 --
  net/ipv4/inet_hashtables.c     | 15 +++++++-------
  net/ipv4/udp.c                 | 23 +++++++++++++++++++---
  net/ipv6/inet6_hashtables.c    | 19 +++++++++---------
  net/ipv6/udp.c                 | 23 +++++++++++++++++++---
  tools/include/uapi/linux/bpf.h |  3 ---
  10 files changed, 119 insertions(+), 39 deletions(-)


diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index e7391bf310a7..920131e4a65d 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -332,10 +332,10 @@ static inline int compute_score(struct sock *sk, struct net *net,
         return score;
  }

-static inline struct sock *lookup_reuseport(struct net *net, struct sock *sk,
-                                           struct sk_buff *skb, int doff,
-                                           __be32 saddr, __be16 sport,
-                                           __be32 daddr, unsigned short hnum)
+struct sock *inet_lookup_reuseport(struct net *net, struct sock *sk,
+                                  struct sk_buff *skb, int doff,
+                                  __be32 saddr, __be16 sport,
+                                  __be32 daddr, unsigned short hnum)
  {
         struct sock *reuse_sk = NULL;
         u32 phash;
@@ -346,6 +346,7 @@ static inline struct sock *lookup_reuseport(struct net *net, struct sock *sk,
         }
         return reuse_sk;
  }
+EXPORT_SYMBOL_GPL(inet_lookup_reuseport);

  /*
   * Here are some nice properties to exploit here. The BSD API
@@ -369,8 +370,8 @@ static struct sock *inet_lhash2_lookup(struct net *net,
         sk_nulls_for_each_rcu(sk, node, &ilb2->nulls_head) {
                 score = compute_score(sk, net, hnum, daddr, dif, sdif);
                 if (score > hiscore) {
-                       result = lookup_reuseport(net, sk, skb, doff,
-                                                 saddr, sport, daddr, hnum);
+                       result = inet_lookup_reuseport(net, sk, skb, doff,
+                                                      saddr, sport, daddr, hnum);
                         if (result)
                                 return result;


Please split in a series.

First a patch renaming lookup_reuseport() to inet_lookup_reuseport()
and inet6_lookup_reuseport()
(cleanup, no change in behavior)

This would ease review and future bug hunting quite a bit.

Makes sense and should reduce the churn on the actual change.

I think Lorenz is planning to flush out a v2 next week with this split.

Thanks,
Daniel




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux