Re: Suboptimal error handling in libnftables

Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> · Thu, 2 Dec 2021 14:54:02 +0100

On Thu, Dec 02, 2021 at 02:16:12PM +0100, Eugene Crosser wrote:
> Hello,
> 
> there is read-from-the-socket loop in src/iface.c line 90 (function
> iface_cache_update()), and it (and other places) call macro
> netlink_init_error() to report error. The function behind the macro is
> in src/netlink.c line 81, and it calls exit(NFT_EXIT_NONL) after writing
> a message to stderr.
> 
> I see two problems with this:
> 
> 1. All read-from-the-socket functions should be run in a loop, repeating
> if return code is -1 and errno is EINTR. I.e. EINTR should not be
> treated as an error, but as a condition that requires retry.
> 
> 2. Library functions are not supposed to call exit() (or abort() for
> that matter). They are expected to return an error indication to the
> caller, who may have its own strategy for handling error conditions.
> 
> Case in point, we have a daemon (in Python) that uses bindings to
> libnftables. It's a service responding to requests coming over a TCP
> connection, and it takes care to intercept any error situations and
> report them back. We discovered that under some conditions, it just
> closes the socket and goes away. This being a daemon, stderr was not
> immediately accessible; and even it it were, it is pretty hard to figure
> where did the message "iface.c:98: Unable to initialize Netlink socket:
> Interrupted system call" come from and why!

This missing EINTR handling for iface_cache_update() is a bug, would
you post a patch for this?

> There is another function that calls exit(), __netlink_abi_error(). I
> believe that even in such a harsh situation, exit() is not the right way
> to handle it.

ABI breakage between kernel and userspace should not ever happen.