Re: [RFC] Implement connection tracking event over netlink unicast

Patrick McHardy <kaber@xxxxxxxxx> · Mon, 25 Feb 2008 14:02:33 +0100

Pablo Neira Ayuso wrote:
Hi,

Time ago I came up with the idea of adding a new target which marks
events as volatile to explicitly tell ctnetlink not to deliver events
over netlink for the traffic that matches the rule.

Although this approach is simple, it has serious limitations. For
example, in a ulogd + conntrackd setup the user may want to log all the
traffic but only replicate certain connections to the backup node. Also,
the conntrackd user may want to replicate only ESTABLISHED states
instead of all states to reduce resource consumption in a busy firewall
but log only the destroy messages coming from ctnetlink for accounting
purposes. In both cases, this target would not help.

For that reason, I decided to start a patch which implements connection
tracking event delivery over netlink unicast in a similar fashion as
other netfilter subsystems such as log and queue.

Attached a RFC patch (yet untested with no IPv6 support and no userspace
part) that implements connection tracking event delivery over netlink
unicast. This patch introduces the NFCONNTRACK target which is similar
to NFQUEUE and NFLOG. An example of usage is:

iptables -I POSTROUTING -t events -m state ESTABLISHED -j NFCONNTRACK
--queue-num 1

With this rule, only the ESTABLISHED connection tracking events are
delivered over netlink unicast to the userspace program listening in the
queue number 1. This enables events filtering through iptables, so that
a user can tell what events are propagated to userspace. This approach
removes the limitation of my initial idea since we can do specific
filtering for ulogd and conntrackd, both listeing in different queue
numbers, without disturbing each other.

Makes sense.

This patch introduces several important changes:

* A new table, so-called 'events', which is placed at the end of
postrouting once the connection tracking has successfully confirmed a
conntrack entry to avoid event delivery of possibly dropped packets in
the 'filter' table. In practise, this table is only loaded as module
when the user wants to perform event filtering so that it does not
introduce any overhead for users that do not need this feature.

I hate new tables :) Its a bit sad that we need this, its not exactly
consistent that NOTRACK works in the raw table and this in the events
table.

Will there also be a new match to filter on connection state?
Using ESTABLISHED is pretty useless since in the events table
all connections will be ESTABLISHED.

* A new target, so-called NFCONNTRACK, which can only be inserted in the
events table.

* An extra hook which is only present if connection tracking event
delivery is enabled.

In terms of hooks, the changes are the following:

        NF_IP_PRI_CONNTRACK_CONFIRM = INT_MAX - 2,
        NF_IP_PRI_EVENTS_TABLE = INT_MAX - 1,
        NF_IP_PRI_CONNTRACK_EVENTS = INT_MAX,
        NF_IP_PRI_LAST = INT_MAX,

The 'events' table is placed after the conntrack confirmation, and the
final event delivery through the ecache infrastructure is placed at the
end. If we schedule for removal events delivery over netlink broadcast
we can get rid of NF_IP_PRI_CONNTRACK_EVENTS hook, however, this would
break backward anyway. Another choice to remove this hook can be to
recover the idea of the per-skbuff event cache, however, this means 4
bytes extra per skbuff.

Another argument in favor of this patch is that it homogenize the
netfilter subsystem so that they all require a target to enable netlink
features.

I don't think we should remove broadcast event delivery. QUEUE and LOG
are different from conntrack in the sense that conntrack really does
react to internal events, which are non-existant for QUEUE and LOG.

Comments welcome.

A general comment: it would make this easier to review if you'd
split it up more fine-grained, for example:

- 1: conntrack event delivery changes
- 2: conntrack unicast event delivery
- 3: event table
- 4: NFCONNTRACK target

Some implementation-specific comments below:

 #ifdef CONFIG_NF_CONNTRACK_EVENTS
...

This stuff shouldn't depend on conntrack-events in my opinion.
Conntrack-events should merely control whether the event cache
and similar stuff is included in nf_conntrack, without which
the ctnetlink event delivery does not work, but the manual
unicast delivery should work regardless, right?

OK I see below that it needs the event cache, still it would
be nice to have this working without the notifications from
conntrack.

+
+static HLIST_HEAD(equeue_list);

Are you sure you don't want a small hash table for this?

+static DEFINE_SPINLOCK(equeue_lock);
+
+struct nfct_equeue {
+	struct hlist_node hlist;
+	struct rcu_head rcu;
+
+	int queue_num;

unsigned or u16 would make more sense.

+	int peer_pid;
+};
+
+static struct nfct_equeue *ctnetlink_lookup_equeue(int queue_num)
+{
+	struct hlist_node *pos;
+	struct nfct_equeue *this;
+
+	hlist_for_each_entry_rcu(this, pos, &equeue_list, hlist) {
+		if (this->queue_num == queue_num)
+			return this;
+	}
+	return NULL;
+}
+
+static int ctnetlink_create_equeue(u_int16_t queue_num, int pid)

netlink uses u32 for the pids, this should too for consistency.

+static void ctnetlink_destroy_equeue(u_int16_t queue_num, int pid)
+{
+	struct nfct_equeue *equeue;
+
+	spin_lock(&equeue_lock);
+
+	equeue = ctnetlink_lookup_equeue(queue_num);
+	BUG_ON(equeue == NULL);

This is wrong, we shouldn't crash just because userspace specified
an invalid queue number.

+
+	__ctnetlink_destroy_equeue(equeue);
+
+	spin_unlock(&equeue_lock);
+}
+
+static const struct nla_policy ct_events_nla_policy[CTA_UEVENT_MAX+1] = {
+	[CTA_UEVENT_CFG_BIND] 	= { .type = NLA_U16 },
+	[CTA_UEVENT_CFG_UNBIND] = { .type = NLA_U16 },
+};
+
+static int
+ctnetlink_uevents_conntrack(struct sock *ctnl, struct sk_buff *skb,
+			    struct nlmsghdr *nlh, struct nlattr *cda[])
+{
+	int ret;
+
+	if (cda[CTA_UEVENT_CFG_BIND]) {
+		u_int16_t queue_num = nla_get_be16(cda[CTA_UEVENT_CFG_BIND]);
+
+		ret = ctnetlink_create_equeue(queue_num, NETLINK_CB(skb).pid);
+		if (ret < 0)
+			return ret;

This will return -ENOTSUPP below in case no error occured.

+	}
+	if (cda[CTA_UEVENT_CFG_UNBIND]) {
+		u_int16_t queue_num = nla_get_be16(cda[CTA_UEVENT_CFG_UNBIND]);
+
+		ctnetlink_destroy_equeue(queue_num, NETLINK_CB(skb).pid);
+		return 0;
+	}
+
+	return -ENOTSUPP;
+}
+
+static int
+ctnetlink_nl_event(struct notifier_block *this, unsigned long event, void *ptr)
+{
+	struct nfct_equeue *equeue;
+	struct hlist_node *tmp, *tmp2;
+	struct netlink_notify *n = ptr;
+
+	if (event != NETLINK_URELEASE ||
+	    n->protocol != NETLINK_NETFILTER || !n->pid)
+	    	return NOTIFY_DONE;
+
+	spin_lock(&equeue_lock);
+	hlist_for_each_entry_safe(equeue, tmp, tmp2, &equeue_list, hlist) {
+		if ((n->net == &init_net) && (n->pid == equeue->peer_pid)) {
+		    	__ctnetlink_destroy_equeue(equeue);
+		}

No unnecessary parens please (both the conditions and the expression).

+	}
+	spin_unlock(&equeue_lock);
+
+	return NOTIFY_DONE;
+}

+int nf_ct_send_uevent(const struct nf_conn *ct, int queue_num)
+{
+	struct sk_buff *skb;
+	struct nfct_equeue *equeue;
+	unsigned int group;
+	unsigned int events;
+
+	equeue = ctnetlink_lookup_equeue(queue_num);
+	if (!equeue)
+		return -1;
+
+	if (!nf_ct_get_current_events(ct, &events))
+		return -1;

I don't think this will work. nf_ct_get_current_events does a lookup
in the event cache, which assumes we're still running on the same
CPU, which is not necessarily true.

+
+	skb = ctnetlink_build_event_msg(ct, events, &group);
+	if (!skb)
+		return -1;
+
+	return nfnetlink_unicast(skb, equeue->peer_pid, MSG_DONTWAIT);
+}
+EXPORT_SYMBOL_GPL(nf_ct_send_uevent);


Index: net-2.6.git/net/netfilter/xt_NFCONNTRACK.c
===================================================================

--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ net-2.6.git/net/netfilter/xt_NFCONNTRACK.c	2008-02-22 19:33:35.000000000 +0100
@@ -0,0 +1,75 @@
+/* iptables module for using new netfilter netlink conntrack unicast events
+ *
+ * (C) 2008 by Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/module.h>
+#include <linux/skbuff.h>
+
+#include <linux/netfilter.h>
+#include <linux/netfilter/x_tables.h>
+#include <linux/netfilter/xt_NFCONNTRACK.h>
+#include <net/netfilter/nf_conntrack.h>
+#include <linux/netfilter/nfnetlink_conntrack.h>
+
+MODULE_AUTHOR("Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx>");
+MODULE_DESCRIPTION("[ip,ip6]_tables NFCONNTRACK target");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS("ipt_NFCONNTRACK");
+MODULE_ALIAS("ip6t_NFCONNTRACK");


Not directly related, but this alias stuff sucks. We should probe
the x_tables modules and only use aliases for compatibility with
old scripts doing manual module loading.

+static unsigned int
+nfct_tg(struct sk_buff *skb, const struct net_device *in,
+	const struct net_device *out, unsigned int hooknum,
+	const struct xt_target *target, const void *targinfo)
+{
+	const struct xt_NFCT_info *tinfo = targinfo;
+	const struct nf_conn *ct;
+	enum ip_conntrack_info ctinfo;
+
+	ct = nf_ct_get(skb, &ctinfo);
+	if (!ct)
+		return XT_CONTINUE;
+
+	/* ignore our fake conntrack entry */
+	if (ct == &nf_conntrack_untracked)
+		return XT_CONTINUE;
+
+	return nf_ct_send_uevent(ct, tinfo->queuenum);
+}
+
+static struct xt_target nfct_tg_reg[] __read_mostly = {
+	{
+		.name		= "NFCONNTRACK",
+		.family		= AF_INET,
+		.target		= nfct_tg,
+		.targetsize	= sizeof(struct xt_NFCT_info),
+		.table		= "events",
+		.me		= THIS_MODULE,
+	},
+	{
+		.name		= "NFCONNTRACK",
+		.family		= AF_INET6,
+		.target		= nfct_tg,
+		.targetsize	= sizeof(struct xt_NFCT_info),
+		.table		= "events",
+		.me		= THIS_MODULE,
+	},
+};
+
+static int __init nfct_tg_init(void)
+{
+	return xt_register_targets(nfct_tg_reg, ARRAY_SIZE(nfct_tg_reg));
+}
+
+static void __exit nfct_tg_exit(void)
+{
+	xt_unregister_targets(nfct_tg_reg, ARRAY_SIZE(nfct_tg_reg));
+}
+
+module_init(nfct_tg_init);
+module_exit(nfct_tg_exit);
Index: net-2.6.git/include/linux/netfilter_ipv4.h
===================================================================
--- net-2.6.git.orig/include/linux/netfilter_ipv4.h	2008-02-22 18:51:48.000000000 +0100
+++ net-2.6.git/include/linux/netfilter_ipv4.h	2008-02-22 18:53:39.000000000 +0100
@@ -62,9 +62,11 @@ enum nf_ip_hook_priorities {
 	NF_IP_PRI_FILTER = 0,
 	NF_IP_PRI_NAT_SRC = 100,
 	NF_IP_PRI_SELINUX_LAST = 225,
-	NF_IP_PRI_CONNTRACK_HELPER = INT_MAX - 2,
-	NF_IP_PRI_NAT_SEQ_ADJUST = INT_MAX - 1,
-	NF_IP_PRI_CONNTRACK_CONFIRM = INT_MAX,
+	NF_IP_PRI_CONNTRACK_HELPER = INT_MAX - 4,
+	NF_IP_PRI_NAT_SEQ_ADJUST = INT_MAX - 3,
+	NF_IP_PRI_CONNTRACK_CONFIRM = INT_MAX - 2,
+	NF_IP_PRI_EVENTS_TABLE = INT_MAX - 1,
+	NF_IP_PRI_CONNTRACK_EVENTS = INT_MAX,

Why do you need the CONNTRACK_EVENTS hook exactly?
-
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html