On Tue, Feb 6, 2024 at 12:43 AM Paolo Abeni <pabeni@xxxxxxxxxx> wrote: > > What about 'destination/port unreachable' and many other similar errors > reported by sk_err? Which specific errors reported by sk_err does not > indicate that anything is wrong with the socket ? Destination/port unreachable are *exactly* the primary offenders. Consider: 1. TCP socket. If the peer becomes unreachable, the connection is unusable. Maybe reading previously queued data is reasonable; maybe it's not, but one way or another the connection isn't working any more. The current API seems okay. 2. UDP peer-to-peer connection. I have a socket and it's connected to a peer. The peer sends an ICMP error or a route changes and the kernel can't route to the peer. The connection is at least temporarily dead. If we accept that temporarily dead equals permanently dead, then returning errors codes makes sense. Even if we expect the application to try to recover without making a new socket, telling the application seems fine. The application will understand that an error occurred communicating with its peer and can do something about it. 3. UDP *server* with multiple clients. (Or unconnected UDP socket communicating with multiple peers, etc.) Imagine a DNS server or a QUIC server -- I hear QUIC is cool lately. A userspace server has a socket, and it does sendto() or sendmsg() to a whole bunch of addresses. One of them sends an ICMP error. There are multiple things the server might do. It might ignore the error entirely and treat it just like a timeout, because it probably already has perfectly nice timeout handling. Or it might want to know that there was an error communicating with a *specific* peer and release resources sooner than it would for a timeout. Or it might want to collect the entire ICMP error (via RECVERR) and do something useful with it. But it gets no value whatsoever from knowing that an unspecified peer sent an ICMP error, and it gets negative value from having a call to recvfrom() or recvmsg() fail and needing to look up in some hopefully-correct table whether the failure indicates an actual problem (EFAULT, for example) or a completely useless return value that should be ignored (EHOSTUNREACH). (#3 is probably worse if the application uses one-shot notifications -- the application needs to make a decision as to whether to call recvfrom/recvmsg again or go back to polling.) --Andy