Re: The sk_err mechanism is infuriating in userspace

Andy Lutomirski <luto@xxxxxxxxxxxxxx> · Tue, 6 Feb 2024 09:24:45 -0800

On Tue, Feb 6, 2024 at 12:43 AM Paolo Abeni <pabeni@xxxxxxxxxx> wrote:
>
> What about 'destination/port unreachable' and many other similar errors
> reported by sk_err? Which specific errors reported by sk_err does not
> indicate that anything is wrong with the socket ?

Destination/port unreachable are *exactly* the primary offenders.  Consider:

1. TCP socket.  If the peer becomes unreachable, the connection is
unusable.  Maybe reading previously queued data is reasonable; maybe
it's not, but one way or another the connection isn't working any
more.  The current API seems okay.

2. UDP peer-to-peer connection.  I have a socket and it's connected to
a peer.  The peer sends an ICMP error or a route changes and the
kernel can't route to the peer.  The connection is at least
temporarily dead.  If we accept that temporarily dead equals
permanently dead, then returning errors codes makes sense.  Even if we
expect the application to try to recover without making a new socket,
telling the application seems fine.  The application will understand
that an error occurred communicating with its peer and can do
something about it.

3. UDP *server* with multiple clients.  (Or unconnected UDP socket
communicating with multiple peers, etc.)  Imagine a DNS server or a
QUIC server -- I hear QUIC is cool lately.  A userspace server has a
socket, and it does sendto() or sendmsg() to a whole bunch of
addresses.  One of them sends an ICMP error.  There are multiple
things the server might do.  It might ignore the error entirely and
treat it just like a timeout, because it probably already has
perfectly nice timeout handling.  Or it might want to know that there
was an error communicating with a *specific* peer and release
resources sooner than it would for a timeout.  Or it might want to
collect the entire ICMP error (via RECVERR) and do something useful
with it.  But it gets no value whatsoever from knowing that an
unspecified peer sent an ICMP error, and it gets negative value from
having a call to recvfrom() or recvmsg() fail and needing to look up
in some hopefully-correct table whether the failure indicates an
actual problem (EFAULT, for example) or a completely useless return
value that should be ignored (EHOSTUNREACH).

(#3 is probably worse if the application uses one-shot notifications
-- the application needs to make a decision as to whether to call
recvfrom/recvmsg again or go back to polling.)

--Andy