On Mon, Sep 27, 2021 at 11:07 AM John Fastabend <john.fastabend@xxxxxxxxx> wrote: > > Cong Wang wrote: > > From: Cong Wang <cong.wang@xxxxxxxxxxxxx> > > > > Yucong noticed we can't poll() sockets in sockmap even > > when they are the destination sockets of redirections. > > This is because we never poll any psock queues in ->poll(). > > We can not overwrite ->poll() as it is in struct proto_ops, > > not in struct proto. > > > > So introduce sk_msg_poll() to poll psock ingress_msg queue > > and let sockets which support sockmap invoke it directly. > > > > Reported-by: Yucong Sun <sunyucong@xxxxxxxxx> > > Cc: John Fastabend <john.fastabend@xxxxxxxxx> > > Cc: Daniel Borkmann <daniel@xxxxxxxxxxxxx> > > Cc: Jakub Sitnicki <jakub@xxxxxxxxxxxxxx> > > Cc: Lorenz Bauer <lmb@xxxxxxxxxxxxxx> > > Signed-off-by: Cong Wang <cong.wang@xxxxxxxxxxxxx> > > --- > > include/linux/skmsg.h | 6 ++++++ > > net/core/skmsg.c | 15 +++++++++++++++ > > net/ipv4/tcp.c | 2 ++ > > net/ipv4/udp.c | 2 ++ > > net/unix/af_unix.c | 5 +++++ > > 5 files changed, 30 insertions(+) > > > > [...] > struct sk_buff *skb) > > { > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > > index e8b48df73c85..2eb1a87ba056 100644 > > --- a/net/ipv4/tcp.c > > +++ b/net/ipv4/tcp.c > > @@ -280,6 +280,7 @@ > > #include <linux/uaccess.h> > > #include <asm/ioctls.h> > > #include <net/busy_poll.h> > > +#include <linux/skmsg.h> > > > > /* Track pending CMSGs. */ > > enum { > > @@ -563,6 +564,7 @@ __poll_t tcp_poll(struct file *file, struct socket *sock, poll_table *wait) > > > > if (tcp_stream_is_readable(sk, target)) > > mask |= EPOLLIN | EPOLLRDNORM; > > + mask |= sk_msg_poll(sk); > > > > if (!(sk->sk_shutdown & SEND_SHUTDOWN)) { > > if (__sk_stream_is_writeable(sk, 1)) { > > > For TCP we implement the stream_memory_read() hook which we implement in > tcp_bpf.c with tcp_bpf_stream_read. This just checks psock->ingress_msg > list which should cover any redirect from skmsg into the ingress side > of another socket. > > And the tcp_poll logic is using tcp_stream_is_readable() which is > checking for sk->sk_prot->stream_memory_read() and then calling it. Ah, I missed it. It is better to have such a hook in struct proto, since we just can overwrite it with bpf hooks. Let me rename it for non-TCP and implement it for UDP and AF_UNIX too. > > The straight receive path, e.g. not redirected from a sender should > be covered by the normal tcp_epollin_ready() checks because this > would be after TCP does the normal updates to rcv_nxt, copied_seq, > etc. Yes. > > So above is not in the TCP case by my reading. Did I miss a > case? We also have done tests with Envoy which I thought were polling > so I'll check on that as well. Right, all of these selftests in patch 3/3 are non-TCP. Thanks.