On Mon, Mar 16, 2020 at 08:06:38PM -0700, Joe Stringer wrote: > On Mon, Mar 16, 2020 at 3:58 PM Martin KaFai Lau <kafai@xxxxxx> wrote: > > > > On Thu, Mar 12, 2020 at 04:36:44PM -0700, Joe Stringer wrote: > > > Add support for TPROXY via a new bpf helper, bpf_sk_assign(). > > > > > > This helper requires the BPF program to discover the socket via a call > > > to bpf_sk*_lookup_*(), then pass this socket to the new helper. The > > > helper takes its own reference to the socket in addition to any existing > > > reference that may or may not currently be obtained for the duration of > > > BPF processing. For the destination socket to receive the traffic, the > > > traffic must be routed towards that socket via local route, the socket > > I also missed where is the local route check in the patch. > > Is it implied by a sk can be found in bpf_sk*_lookup_*()? > > This is a requirement for traffic redirection, it's not enforced by > the patch. If the operator does not configure routing for the relevant > traffic to ensure that the traffic is delivered locally, then after > the eBPF program terminates, it will pass up through ip_rcv() and > friends and be subject to the whims of the routing table. (or > alternatively if the BPF program redirects somewhere else then this > reference will be dropped). > > Maybe there's a path to simplifying this configuration path in future > to loosen this requirement, but for now I've kept the series as > minimal as possible on that front. > > > [ ... ] > > > > > diff --git a/net/core/filter.c b/net/core/filter.c > > > index cd0a532db4e7..bae0874289d8 100644 > > > --- a/net/core/filter.c > > > +++ b/net/core/filter.c > > > @@ -5846,6 +5846,32 @@ static const struct bpf_func_proto bpf_tcp_gen_syncookie_proto = { > > > .arg5_type = ARG_CONST_SIZE, > > > }; > > > > > > +BPF_CALL_3(bpf_sk_assign, struct sk_buff *, skb, struct sock *, sk, u64, flags) > > > +{ > > > + if (flags != 0) > > > + return -EINVAL; > > > + if (!skb_at_tc_ingress(skb)) > > > + return -EOPNOTSUPP; > > > + if (unlikely(!refcount_inc_not_zero(&sk->sk_refcnt))) > > > + return -ENOENT; > > > + > > > + skb_orphan(skb); > > > + skb->sk = sk; > > sk is from the bpf_sk*_lookup_*() which does not consider > > the bpf_prog installed in SO_ATTACH_REUSEPORT_EBPF. > > However, the use-case is currently limited to sk inspection. > > > > It now supports selecting a particular sk to receive traffic. > > Any plan in supporting that? > > I think this is a general bpf_sk*_lookup_*() question, previous > discussion[0] settled on avoiding that complexity before a use case > arises, for both TC and XDP versions of these helpers; I still don't > have a specific use case in mind for such functionality. If we were to > do it, I would presume that the socket lookup caller would need to > pass a dedicated flag (supported at TC and likely not at XDP) to > communicate that SO_ATTACH_REUSEPORT_EBPF progs should be respected > and used to select the reuseport socket. It is more about the expectation on the existing SO_ATTACH_REUSEPORT_EBPF usecase. It has been fine because SO_ATTACH_REUSEPORT_EBPF's bpf prog will still be run later (e.g. from tcp_v4_rcv) to decide which sk to recieve the skb. If the bpf@tc assigns a TCP_LISTEN sk in bpf_sk_assign(), will the SO_ATTACH_REUSEPORT_EBPF's bpf still be run later to make the final sk decision? > > > > diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c > > > index 7b089d0ac8cd..f7b42adca9d0 100644 > > > --- a/net/ipv6/ip6_input.c > > > +++ b/net/ipv6/ip6_input.c > > > @@ -285,7 +285,10 @@ static struct sk_buff *ip6_rcv_core(struct sk_buff *skb, struct net_device *dev, > > > rcu_read_unlock(); > > > > > > /* Must drop socket now because of tproxy. */ > > > - skb_orphan(skb); > > > + if (skb_dst_is_sk_prefetch(skb)) > > > + dst_sk_prefetch_fetch(skb); > > > + else > > > + skb_orphan(skb); > > If I understand it correctly, this new test is to skip > > the skb_orphan() call for locally routed skb. > > Others cases (forward?) still depend on skb_orphan() to be called here? > > Roughly yes. 'locally routed skb' is a bit loose wording though, at > this point the BPF program only prefetched the socket to let the stack > know that it should deliver the skb to that socket, assuming that it > passes the upcoming routing check. Which upcoming routing check? I think it is the part I am missing. In patch 4, let say the dst_check() returns NULL (may be due to a route change). Later in the upper stack, it does a route lookup (ip_route_input_noref() or ip6_route_input()). Could it return a forward route? and I assume missing a skb_orphan() call here will still be fine? > > For more discussion on the other cases, there is the previous > thread[1] and in particular the child thread discussion with Florian, > Eric and Daniel. > > [0] https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mail-2Darchive.com_netdev-40vger.kernel.org_msg253250.html&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=VQnoQ7LvghIj0gVEaiQSUw&m=mX45GxyUJ_HfsBIJTVMZY9ztD5rVViDuOIQ0pXtyJcM&s=z5lZSVTonmhT5OeyxsefzUC2fMqDEwFvlEV1qkyrULg&e= > [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spinics.net_lists_netdev_msg580058.html&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=VQnoQ7LvghIj0gVEaiQSUw&m=mX45GxyUJ_HfsBIJTVMZY9ztD5rVViDuOIQ0pXtyJcM&s=oFYt8cTKQEc-wEfY5YSsjfVN3QqBlFGfrrT7DTKw1rc&e=