On Mon, 2024-02-05 at 15:03 -0800, Andy Lutomirski wrote: > Hi all- > > I encounter this issue every couple of years, and it still seems to be > an issue, and it drives me nuts every time I see it. > > I write software that uses unconnected datagram-style sockets. Errors > happen for all kinds of reasons, and my software knows it. My > software even handles the errors and moves on with its life. I use > MSG_ERRQUEUE to understand the errors. But the kernel fights back: > > struct sk_buff *__skb_try_recv_datagram(struct sock *sk, > struct sk_buff_head *queue, > unsigned int flags, int *off, int *err, > struct sk_buff **last) > { > struct sk_buff *skb; > unsigned long cpu_flags; > /* > * Caller is allowed not to check sk->sk_err before skb_recv_datagram() > */ > int error = sock_error(sk); > > if (error) > goto no_packet; > ^^^^^^^^^^ <----- EXCUSE ME? > > The kernel even fights back on the *send* path?!? > > static long sock_wait_for_wmem(struct sock *sk, long timeo) > { > DEFINE_WAIT(wait); > > sk_clear_bit(SOCKWQ_ASYNC_NOSPACE, sk); > for (;;) { > if (!timeo) > break; > if (signal_pending(current)) > break; > set_bit(SOCK_NOSPACE, &sk->sk_socket->flags); > ... > if (READ_ONCE(sk->sk_err)) > break; <-- KERNEL HATES UNCONNECTED SOCKETS! > > This is IMO just broken. I realize it's legacy behavior, but it's > BROKEN legacy behavior. As you noted this is an established behaviour exposed to the user- space, and we can't simply change it, regardless of it's own (eventual lack of) merit. > sk_err does not (at least for an unconnected > socket) indicate that anything is wrong with the socket. What about 'destination/port unreachable' and many other similar errors reported by sk_err? Which specific errors reported by sk_err does not indicate that anything is wrong with the socket ? I guess that if you really want to ignore socket error for datagram sockets at recvmsg()/sendmsg() time you could implement some new socket option to conditionally enable such behaviour on a per socket basis. Cheers, Paolo