On Mon, May 11, 2020 at 10:54 PM CEST, Martin KaFai Lau wrote: > On Mon, May 11, 2020 at 09:26:02PM +0200, Jakub Sitnicki wrote: >> On Mon, May 11, 2020 at 08:59 PM CEST, Martin KaFai Lau wrote: >> > On Mon, May 11, 2020 at 11:08:15AM +0200, Jakub Sitnicki wrote: >> >> On Fri, May 08, 2020 at 08:39 PM CEST, Martin KaFai Lau wrote: >> >> > On Fri, May 08, 2020 at 12:45:14PM +0200, Jakub Sitnicki wrote: >> >> >> On Fri, May 08, 2020 at 09:06 AM CEST, Martin KaFai Lau wrote: >> >> >> > On Wed, May 06, 2020 at 02:54:58PM +0200, Jakub Sitnicki wrote: >> >> >> >> [...] >> >> >> >> >> >> + return -ESOCKTNOSUPPORT; >> >> >> >> + >> >> >> >> + /* Check if socket is suitable for packet L3/L4 protocol */ >> >> >> >> + if (sk->sk_protocol != ctx->protocol) >> >> >> >> + return -EPROTOTYPE; >> >> >> >> + if (sk->sk_family != ctx->family && >> >> >> >> + (sk->sk_family == AF_INET || ipv6_only_sock(sk))) >> >> >> >> + return -EAFNOSUPPORT; >> >> >> >> + >> >> >> >> + /* Select socket as lookup result */ >> >> >> >> + ctx->selected_sk = sk; >> >> >> > Could sk be a TCP_ESTABLISHED sk? >> >> >> >> >> >> Yes, and what's worse, it could be ref-counted. This is a bug. I should >> >> >> be rejecting ref counted sockets here. >> >> > Agree. ref-counted (i.e. checking rcu protected or not) is the right check >> >> > here. >> >> > >> >> > An unrelated quick thought, it may still be fine for the >> >> > TCP_ESTABLISHED tcp_sk returned from sock_map because of the >> >> > "call_rcu(&psock->rcu, sk_psock_destroy);" in sk_psock_drop(). >> >> > I was more thinking about in the future, what if this helper can take >> >> > other sk not coming from sock_map. >> >> >> >> I see, psock holds a sock reference and will not release it until a full >> >> grace period has elapsed. >> >> >> >> Even if holding a ref wasn't a problem, I'm not sure if returning a >> >> TCP_ESTABLISHED socket wouldn't trip up callers of inet_lookup_listener >> >> (tcp_v4_rcv and nf_tproxy_handle_time_wait4), that look for a listener >> >> when processing a SYN to TIME_WAIT socket. >> > Not suggesting the sk_assign helper has to support TCP_ESTABLISHED in TCP >> > if there is no use case for it. >> >> Ack, I didn't think you were. Just explored the consequences. >> >> > Do you have a use case on supporting TCP_ESTABLISHED sk in UDP? >> > From the cover letter use cases, it is not clear to me it is >> > required. >> > >> > or both should only support unconnected sk? >> >> No, we don't have a use case for selecting a connected UDP socket. >> >> I left it as a possiblity because __udp[46]_lib_lookup, where BPF >> sk_lookup is invoked from, can return one. >> >> Perhaps the user would like to connect the selected receiving socket >> (for instance to itself) to ensure its not used for TX? >> >> I've pulled this scenario out of the hat. Happy to limit bpf_sk_assign >> to select only unconnected UDP sockets, if returning a connected one >> doesn't make sense. > OTOH, my concern is: > TCP's SK_LOOKUP can override the kernel choice on TCP_LISTEN sk. > UDP's SK_LOOKUP can override the kernel choice on unconnected sk but > not the connected sk. > > It could be quite confusing to bpf user if a bpf_prog was written to return > both connected and unconnected UDP sk and logically expect both > will be done before the kernel's choice. > That's a fair point. I've been looking at this from the PoV of in-kernel callers of udp socket lookup, which now seems wrong. I agree it would a be surprising if not confusing UAPI. Will limit it to just unconnected UDP in v3. Thanks for raising the concern, Jakub