Pablo Neira Ayuso wrote:
Hi, Time ago I came up with the idea of adding a new target which marks events as volatile to explicitly tell ctnetlink not to deliver events over netlink for the traffic that matches the rule. Although this approach is simple, it has serious limitations. For example, in a ulogd + conntrackd setup the user may want to log all the traffic but only replicate certain connections to the backup node. Also, the conntrackd user may want to replicate only ESTABLISHED states instead of all states to reduce resource consumption in a busy firewall but log only the destroy messages coming from ctnetlink for accounting purposes. In both cases, this target would not help. For that reason, I decided to start a patch which implements connection tracking event delivery over netlink unicast in a similar fashion as other netfilter subsystems such as log and queue. Attached a RFC patch (yet untested with no IPv6 support and no userspace part) that implements connection tracking event delivery over netlink unicast. This patch introduces the NFCONNTRACK target which is similar to NFQUEUE and NFLOG. An example of usage is: iptables -I POSTROUTING -t events -m state ESTABLISHED -j NFCONNTRACK --queue-num 1 With this rule, only the ESTABLISHED connection tracking events are delivered over netlink unicast to the userspace program listening in the queue number 1. This enables events filtering through iptables, so that a user can tell what events are propagated to userspace. This approach removes the limitation of my initial idea since we can do specific filtering for ulogd and conntrackd, both listeing in different queue numbers, without disturbing each other.
Makes sense.
This patch introduces several important changes: * A new table, so-called 'events', which is placed at the end of postrouting once the connection tracking has successfully confirmed a conntrack entry to avoid event delivery of possibly dropped packets in the 'filter' table. In practise, this table is only loaded as module when the user wants to perform event filtering so that it does not introduce any overhead for users that do not need this feature.
I hate new tables :) Its a bit sad that we need this, its not exactly consistent that NOTRACK works in the raw table and this in the events table. Will there also be a new match to filter on connection state? Using ESTABLISHED is pretty useless since in the events table all connections will be ESTABLISHED.
* A new target, so-called NFCONNTRACK, which can only be inserted in the events table. * An extra hook which is only present if connection tracking event delivery is enabled. In terms of hooks, the changes are the following: NF_IP_PRI_CONNTRACK_CONFIRM = INT_MAX - 2, NF_IP_PRI_EVENTS_TABLE = INT_MAX - 1, NF_IP_PRI_CONNTRACK_EVENTS = INT_MAX, NF_IP_PRI_LAST = INT_MAX, The 'events' table is placed after the conntrack confirmation, and the final event delivery through the ecache infrastructure is placed at the end. If we schedule for removal events delivery over netlink broadcast we can get rid of NF_IP_PRI_CONNTRACK_EVENTS hook, however, this would break backward anyway. Another choice to remove this hook can be to recover the idea of the per-skbuff event cache, however, this means 4 bytes extra per skbuff. Another argument in favor of this patch is that it homogenize the netfilter subsystem so that they all require a target to enable netlink features.
I don't think we should remove broadcast event delivery. QUEUE and LOG are different from conntrack in the sense that conntrack really does react to internal events, which are non-existant for QUEUE and LOG.
Comments welcome.
A general comment: it would make this easier to review if you'd split it up more fine-grained, for example: - 1: conntrack event delivery changes - 2: conntrack unicast event delivery - 3: event table - 4: NFCONNTRACK target Some implementation-specific comments below:
#ifdef CONFIG_NF_CONNTRACK_EVENTS ...
This stuff shouldn't depend on conntrack-events in my opinion. Conntrack-events should merely control whether the event cache and similar stuff is included in nf_conntrack, without which the ctnetlink event delivery does not work, but the manual unicast delivery should work regardless, right? OK I see below that it needs the event cache, still it would be nice to have this working without the notifications from conntrack.
+ +static HLIST_HEAD(equeue_list);
Are you sure you don't want a small hash table for this?
+static DEFINE_SPINLOCK(equeue_lock); + +struct nfct_equeue { + struct hlist_node hlist; + struct rcu_head rcu; + + int queue_num;
unsigned or u16 would make more sense.
+ int peer_pid; +}; + +static struct nfct_equeue *ctnetlink_lookup_equeue(int queue_num) +{ + struct hlist_node *pos; + struct nfct_equeue *this; + + hlist_for_each_entry_rcu(this, pos, &equeue_list, hlist) { + if (this->queue_num == queue_num) + return this; + } + return NULL; +} + +static int ctnetlink_create_equeue(u_int16_t queue_num, int pid)
netlink uses u32 for the pids, this should too for consistency.
+static void ctnetlink_destroy_equeue(u_int16_t queue_num, int pid) +{ + struct nfct_equeue *equeue; + + spin_lock(&equeue_lock); + + equeue = ctnetlink_lookup_equeue(queue_num); + BUG_ON(equeue == NULL);
This is wrong, we shouldn't crash just because userspace specified an invalid queue number.
+ + __ctnetlink_destroy_equeue(equeue); + + spin_unlock(&equeue_lock); +} + +static const struct nla_policy ct_events_nla_policy[CTA_UEVENT_MAX+1] = { + [CTA_UEVENT_CFG_BIND] = { .type = NLA_U16 }, + [CTA_UEVENT_CFG_UNBIND] = { .type = NLA_U16 }, +}; + +static int +ctnetlink_uevents_conntrack(struct sock *ctnl, struct sk_buff *skb, + struct nlmsghdr *nlh, struct nlattr *cda[]) +{ + int ret; + + if (cda[CTA_UEVENT_CFG_BIND]) { + u_int16_t queue_num = nla_get_be16(cda[CTA_UEVENT_CFG_BIND]); + + ret = ctnetlink_create_equeue(queue_num, NETLINK_CB(skb).pid); + if (ret < 0) + return ret;
This will return -ENOTSUPP below in case no error occured.
+ } + if (cda[CTA_UEVENT_CFG_UNBIND]) { + u_int16_t queue_num = nla_get_be16(cda[CTA_UEVENT_CFG_UNBIND]); + + ctnetlink_destroy_equeue(queue_num, NETLINK_CB(skb).pid); + return 0; + } + + return -ENOTSUPP; +} + +static int +ctnetlink_nl_event(struct notifier_block *this, unsigned long event, void *ptr) +{ + struct nfct_equeue *equeue; + struct hlist_node *tmp, *tmp2; + struct netlink_notify *n = ptr; + + if (event != NETLINK_URELEASE || + n->protocol != NETLINK_NETFILTER || !n->pid) + return NOTIFY_DONE; + + spin_lock(&equeue_lock); + hlist_for_each_entry_safe(equeue, tmp, tmp2, &equeue_list, hlist) { + if ((n->net == &init_net) && (n->pid == equeue->peer_pid)) { + __ctnetlink_destroy_equeue(equeue); + }
No unnecessary parens please (both the conditions and the expression).
+ } + spin_unlock(&equeue_lock); + + return NOTIFY_DONE; +}
+int nf_ct_send_uevent(const struct nf_conn *ct, int queue_num) +{ + struct sk_buff *skb; + struct nfct_equeue *equeue; + unsigned int group; + unsigned int events; + + equeue = ctnetlink_lookup_equeue(queue_num); + if (!equeue) + return -1; + + if (!nf_ct_get_current_events(ct, &events)) + return -1;
I don't think this will work. nf_ct_get_current_events does a lookup in the event cache, which assumes we're still running on the same CPU, which is not necessarily true.
+ + skb = ctnetlink_build_event_msg(ct, events, &group); + if (!skb) + return -1; + + return nfnetlink_unicast(skb, equeue->peer_pid, MSG_DONTWAIT); +} +EXPORT_SYMBOL_GPL(nf_ct_send_uevent);
Index: net-2.6.git/net/netfilter/xt_NFCONNTRACK.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ net-2.6.git/net/netfilter/xt_NFCONNTRACK.c 2008-02-22 19:33:35.000000000 +0100 @@ -0,0 +1,75 @@ +/* iptables module for using new netfilter netlink conntrack unicast events + * + * (C) 2008 by Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include <linux/module.h> +#include <linux/skbuff.h> + +#include <linux/netfilter.h> +#include <linux/netfilter/x_tables.h> +#include <linux/netfilter/xt_NFCONNTRACK.h> +#include <net/netfilter/nf_conntrack.h> +#include <linux/netfilter/nfnetlink_conntrack.h> + +MODULE_AUTHOR("Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx>"); +MODULE_DESCRIPTION("[ip,ip6]_tables NFCONNTRACK target"); +MODULE_LICENSE("GPL"); +MODULE_ALIAS("ipt_NFCONNTRACK"); +MODULE_ALIAS("ip6t_NFCONNTRACK");
Not directly related, but this alias stuff sucks. We should probe the x_tables modules and only use aliases for compatibility with old scripts doing manual module loading.
+static unsigned int +nfct_tg(struct sk_buff *skb, const struct net_device *in, + const struct net_device *out, unsigned int hooknum, + const struct xt_target *target, const void *targinfo) +{ + const struct xt_NFCT_info *tinfo = targinfo; + const struct nf_conn *ct; + enum ip_conntrack_info ctinfo; + + ct = nf_ct_get(skb, &ctinfo); + if (!ct) + return XT_CONTINUE; + + /* ignore our fake conntrack entry */ + if (ct == &nf_conntrack_untracked) + return XT_CONTINUE; + + return nf_ct_send_uevent(ct, tinfo->queuenum); +} + +static struct xt_target nfct_tg_reg[] __read_mostly = { + { + .name = "NFCONNTRACK", + .family = AF_INET, + .target = nfct_tg, + .targetsize = sizeof(struct xt_NFCT_info), + .table = "events", + .me = THIS_MODULE, + }, + { + .name = "NFCONNTRACK", + .family = AF_INET6, + .target = nfct_tg, + .targetsize = sizeof(struct xt_NFCT_info), + .table = "events", + .me = THIS_MODULE, + }, +}; + +static int __init nfct_tg_init(void) +{ + return xt_register_targets(nfct_tg_reg, ARRAY_SIZE(nfct_tg_reg)); +} + +static void __exit nfct_tg_exit(void) +{ + xt_unregister_targets(nfct_tg_reg, ARRAY_SIZE(nfct_tg_reg)); +} + +module_init(nfct_tg_init); +module_exit(nfct_tg_exit); Index: net-2.6.git/include/linux/netfilter_ipv4.h =================================================================== --- net-2.6.git.orig/include/linux/netfilter_ipv4.h 2008-02-22 18:51:48.000000000 +0100 +++ net-2.6.git/include/linux/netfilter_ipv4.h 2008-02-22 18:53:39.000000000 +0100 @@ -62,9 +62,11 @@ enum nf_ip_hook_priorities { NF_IP_PRI_FILTER = 0, NF_IP_PRI_NAT_SRC = 100, NF_IP_PRI_SELINUX_LAST = 225, - NF_IP_PRI_CONNTRACK_HELPER = INT_MAX - 2, - NF_IP_PRI_NAT_SEQ_ADJUST = INT_MAX - 1, - NF_IP_PRI_CONNTRACK_CONFIRM = INT_MAX, + NF_IP_PRI_CONNTRACK_HELPER = INT_MAX - 4, + NF_IP_PRI_NAT_SEQ_ADJUST = INT_MAX - 3, + NF_IP_PRI_CONNTRACK_CONFIRM = INT_MAX - 2, + NF_IP_PRI_EVENTS_TABLE = INT_MAX - 1, + NF_IP_PRI_CONNTRACK_EVENTS = INT_MAX,
Why do you need the CONNTRACK_EVENTS hook exactly? - To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html