> On Feb 5, 2024, at 3:03 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote: > > Hi all- > > I encounter this issue every couple of years, and it still seems to be > an issue, and it drives me nuts every time I see it. > > I write software that uses unconnected datagram-style sockets. Errors > happen for all kinds of reasons, and my software knows it. My > software even handles the errors and moves on with its life. I use > MSG_ERRQUEUE to understand the errors. But the kernel fights back: > > struct sk_buff *__skb_try_recv_datagram(struct sock *sk, > struct sk_buff_head *queue, > unsigned int flags, int *off, int *err, > struct sk_buff **last) > { > struct sk_buff *skb; > unsigned long cpu_flags; > /* > * Caller is allowed not to check sk->sk_err before skb_recv_datagram() > */ > int error = sock_error(sk); > > if (error) > goto no_packet; > ^^^^^^^^^^ <----- EXCUSE ME? > > The kernel even fights back on the *send* path?!? > > static long sock_wait_for_wmem(struct sock *sk, long timeo) > { > DEFINE_WAIT(wait); > > sk_clear_bit(SOCKWQ_ASYNC_NOSPACE, sk); > for (;;) { > if (!timeo) > break; > if (signal_pending(current)) > break; > set_bit(SOCK_NOSPACE, &sk->sk_socket->flags); > ... > if (READ_ONCE(sk->sk_err)) > break; <-- KERNEL HATES UNCONNECTED SOCKETS! > > This is IMO just broken. I realize it's legacy behavior, but it's > BROKEN legacy behavior. sk_err does not (at least for an unconnected > socket) indicate that anything is wrong with the socket. It indicates > that something is worthy of notice, and it wants to tell me. > > So: > > 1. sock_wait_for_wmem should IMO just not do that on an unconnected > socket. AFAICS it's simply a bug. > > 2. How, exactly, am I supposed to call recvmsg() and, unambiguously, > find out whether recvmsg() actually failed? There are actual errors > (something that indicates that the kernel malfunctioned or the socket > is broken), errors indicating that the packet being received is busted > (skb_copy_datagram_msg, for example), and also errors indicating that > there's an error queued up. > > I would like to know that there's an error queued up. That's what > poll and epoll are for, right? Or a hint from recvmsg() that I should > call MSG_RECVERR too. Or it could have a mode where it returns a > normal datagram *or* an error as appropriate. But the current state > of affairs is just brittle and racy. > > Are there any reasonably implementable, non-breaking ways to improve > the API so that programs that understand socket errors can actually > function fully correctly without gnarly retry loops in userspace and > silly heuristics about what errors are actually errors? Contemplating this, recvmsg() can sent status information back via msg_flags. Maybe we could characterize a recvmsg() call as doing one of the following things: 1. Actually fails, via -EFAULT or otherwise. Userspace can get an errno but doesn’t know beyond that what actually went wrong. Should never happen in a correct program. ENOMEM is not in this category. 2. There is nothing to receive. This is -EAGAIN. 3. Received an sk_err error. This is a *success*, and it comes with an error code. Users of RECVERR can’t reliably correlate this with an ERRQUEUE message. Maybe they don’t care. 4. Received a datagram. 5. Received a queued error message a la ERRQUEUE. 6. Dequeued a datagram (or ERRQUEUE) but did *not* receive it due to a checksum error or other error. (And there should be a clear indication of whether the call succeeded but something was wrong with the message or whether the call *failed* for an unexpected reason but the offending message was nonetheless removed from the socket’s queue). Maybe 7: Received a message (or ERRQUEUE), and the checksum was wrong, but the data is being returned anyway. I suppose that a flag could enable this mode and then all but #1 would return a *success* code from the syscall. And msg_flags would contain an indication as to what actually happened. Thoughts? Does io_uring affect any of this? > > Grumpily, > Andy