Hi all- I encounter this issue every couple of years, and it still seems to be an issue, and it drives me nuts every time I see it. I write software that uses unconnected datagram-style sockets. Errors happen for all kinds of reasons, and my software knows it. My software even handles the errors and moves on with its life. I use MSG_ERRQUEUE to understand the errors. But the kernel fights back: struct sk_buff *__skb_try_recv_datagram(struct sock *sk, struct sk_buff_head *queue, unsigned int flags, int *off, int *err, struct sk_buff **last) { struct sk_buff *skb; unsigned long cpu_flags; /* * Caller is allowed not to check sk->sk_err before skb_recv_datagram() */ int error = sock_error(sk); if (error) goto no_packet; ^^^^^^^^^^ <----- EXCUSE ME? The kernel even fights back on the *send* path?!? static long sock_wait_for_wmem(struct sock *sk, long timeo) { DEFINE_WAIT(wait); sk_clear_bit(SOCKWQ_ASYNC_NOSPACE, sk); for (;;) { if (!timeo) break; if (signal_pending(current)) break; set_bit(SOCK_NOSPACE, &sk->sk_socket->flags); ... if (READ_ONCE(sk->sk_err)) break; <-- KERNEL HATES UNCONNECTED SOCKETS! This is IMO just broken. I realize it's legacy behavior, but it's BROKEN legacy behavior. sk_err does not (at least for an unconnected socket) indicate that anything is wrong with the socket. It indicates that something is worthy of notice, and it wants to tell me. So: 1. sock_wait_for_wmem should IMO just not do that on an unconnected socket. AFAICS it's simply a bug. 2. How, exactly, am I supposed to call recvmsg() and, unambiguously, find out whether recvmsg() actually failed? There are actual errors (something that indicates that the kernel malfunctioned or the socket is broken), errors indicating that the packet being received is busted (skb_copy_datagram_msg, for example), and also errors indicating that there's an error queued up. I would like to know that there's an error queued up. That's what poll and epoll are for, right? Or a hint from recvmsg() that I should call MSG_RECVERR too. Or it could have a mode where it returns a normal datagram *or* an error as appropriate. But the current state of affairs is just brittle and racy. Are there any reasonably implementable, non-breaking ways to improve the API so that programs that understand socket errors can actually function fully correctly without gnarly retry loops in userspace and silly heuristics about what errors are actually errors? Grumpily, Andy