Re: [PATCH 5/5] ctnetlink: optional reliable event delivery

Patrick McHardy <kaber@xxxxxxxxx> · Fri, 27 Mar 2009 11:12:17 +0100

Pablo Neira Ayuso wrote:
This patch improves ctnetlink event reliability if one broadcast
listener has set the NETLINK_BROADCAST_ERROR socket option.

The logic is the following: if the event delivery fails, ctnetlink
sets IPCT_DELIVERY_FAILED event bit and keep the undelivered
events in the conntrack event cache. Thus, once the next packet
arrives, we trigger another event delivery in nf_conntrack_in(). If
things don't go well in this second try, we accumulate the pending
events in the cache but we try to deliver the current state as soon
as possible. Therefore, we may lost state transitions but the
userspace process gets in sync at some point.

Sounds good so far. Except - this won't work without per-conntrack
events I guess. And those would need (quite a lot of) atomic operations
again.

At worst case, if no events were delivered to userspace, we make
sure that destroy events are successfully delivered. This happens
because if ctnetlink fails to deliver the destroy event, we re-add
the conntrack timer with an extra grace timeout of 15 seconds to
trigger the event again (this grace timeout is tunable via /proc).

If a packet closing the flow (like a TCP RST) kills the conntrack
entry but the event delivery fails, we drop the packet and keep
the entry in the kernel. This is the only case in which we may drop
a packets.

This last two points don't seem like good behaviour at all. The
timeouts are supposed to have a meaning, so they should *at least*
deactivate the conntrack. Explicit removal of the conntrack should
*never* fail. TCP conntrack needs it for connection reopening,
handling out of sync sessions etc.

For expectations, the logic is more simple, if the event delivery
fails, we drop the packet.

.. while restoring the expectation to an active state I hope?
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html