On Tue, Nov 26, 2019 at 03:30:57PM +0100, Jakub Sitnicki wrote: > On Mon, Nov 25, 2019 at 11:07 PM CET, Martin Lau wrote: > > On Mon, Nov 25, 2019 at 11:40:41AM +0100, Jakub Sitnicki wrote: > >> On Mon, Nov 25, 2019 at 05:17 AM CET, John Fastabend wrote: > >> > Alexei Starovoitov wrote: > >> >> On Sat, Nov 23, 2019 at 12:07:48PM +0100, Jakub Sitnicki wrote: > >> >> > SOCKMAP now supports storing references to listening sockets. Nothing keeps > >> >> > us from using it as an array of sockets to select from in SK_REUSEPORT > >> >> > programs. > >> >> > > >> >> > Whitelist the map type with the BPF helper for selecting socket. However, > >> >> > impose a restriction that the selected socket needs to be a listening TCP > >> >> > socket or a bound UDP socket (connected or not). > >> >> > > >> >> > The only other map type that works with the BPF reuseport helper, > >> >> > REUSEPORT_SOCKARRAY, has a corresponding check in its update operation > >> >> > handler. > >> >> > > >> >> > Signed-off-by: Jakub Sitnicki <jakub@xxxxxxxxxxxxxx> > >> >> > --- > >> > > >> > [...] > >> > > >> >> > diff --git a/net/core/filter.c b/net/core/filter.c > >> >> > index 49ded4a7588a..e3fb77353248 100644 > >> >> > --- a/net/core/filter.c > >> >> > +++ b/net/core/filter.c > >> >> > @@ -8723,6 +8723,8 @@ BPF_CALL_4(sk_select_reuseport, struct sk_reuseport_kern *, reuse_kern, > >> >> > selected_sk = map->ops->map_lookup_elem(map, key); > >> >> > if (!selected_sk) > >> >> > return -ENOENT; > >> >> > + if (!sock_flag(selected_sk, SOCK_RCU_FREE)) > >> >> > + return -EINVAL; > > If I read it correctly, > > this is to avoid the following "if (!reuse)" to return -ENOENT, > > and instead returns -EINVAL for non TCP_LISTEN tcp_sock. > > It should at least only be done under the "if (!reuse)" then. > > Yes, exactly. For an established TCP socket in SOCKMAP we would get > -ENOENT because sk_reuseport_cb is not set. Which is a bit confusing > since the map entry exists. > > Returning -EINVAL matches the REUSEPORT_SOCKARRAY update operation > semantics for established TCP sockets. > > But this is just about returning an informative error so you're > completely right that this should be done under "if (!reuse)" branch to > avoid the extra cost on the happy path. > > > Checking SOCK_RCU_FREE to imply TCP_LISTEN is not ideal. > > It is not immediately obvious. Why not directly check > > TCP_LISTEN? > > I agree, it's not obvious. When I first saw this check in > reuseport_array_update_check it got me puzzled too. I should have added > an explanatory comment there. > > Thing is we're not matching on just TCP_LISTEN. REUSEPORT_SOCKARRAY > allows selecting a connected UDP socket as a target as well. It takes > some effort to set up but it's possible even if obscure. How about this instead: if (!reuse) /* reuseport_array only has sk that has non NULL sk_reuseport_cb. * The only (!reuse) case here is, the sk has already been removed from * reuseport_array, so treat it as -ENOENT. * * Other maps (e.g. sock_map) do not provide this guarantee and the sk may * never be in the reuseport to begin with. */ return map->map_type == BPF_MAP_TYPE_REUSEPORT_SOCKARRAY ? -ENOENT : -EINVAL; > > > Note that the SOCK_RCU_FREE check at the 'slow-path' > > reuseport_array_update_check() is because reuseport_array does depend on > > call_rcu(&sk->sk_rcu,...) to work, e.g. the reuseport_array > > does not hold the sk_refcnt. > > Oh, so it's not only about socket state like I thought. > > This raises the question - does REUSEPORT_SOCKARRAY allow storing > connected UDP sockets by design or is it a happy accident? It doesn't > seem particularly useful. Not by design/accident on the REUSEPORT_SOCKARRAY side ;) The intention of REUSEPORT_SOCKARRAY is to allow sk that can be added to reuse->socks[].