Re: [PATCH] netfilter: xtables: add cluster match

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Patrick McHardy wrote:
Pablo Neira Ayuso wrote:
This patch adds the iptables cluster match. This match can be used
to deploy gateway and back-end load-sharing clusters.

I'm mixing comments to the cluster match and the ARP mangle target.

Assuming that all the nodes see all packets (see below for an
example on how to do that if your switch does not allow this), the
cluster match decides if this node has to handle a packet given:

    jhash(source IP) % total_nodes == node_id

For related connections, the master conntrack is used. The following
is an example of its use to deploy a gateway cluster composed of two
nodes (where this is the node 1):

iptables -I PREROUTING -t mangle -i eth1 -m cluster \
    --cluster-total-nodes 2 --cluster-local-node 1 \
    --cluster-proc-name eth1 -j MARK --set-mark 0xffff
iptables -A PREROUTING -t mangle -i eth1 \
    -m mark ! --mark 0xffff -j DROP
iptables -A PREROUTING -t mangle -i eth2 -m cluster \
    --cluster-total-nodes 2 --cluster-local-node 1 \
    --cluster-proc-name eth2 -j MARK --set-mark 0xffff
iptables -A PREROUTING -t mangle -i eth2 \
    -m mark ! --mark 0xffff -j DROP

And the following commands to make all nodes see the same packets:

ip maddr add 01:00:5e:00:01:01 dev eth1
ip maddr add 01:00:5e:00:01:02 dev eth2
arptables -I OUTPUT -o eth1 --h-length 6 \
    -j mangle --mangle-mac-s 01:00:5e:00:01:01
arptables -I INPUT -i eth1 --h-length 6 \
    --destination-mac 01:00:5e:00:01:01 \
    -j mangle --mangle-mac-d 00:zz:yy:xx:5a:27

Mhh, is the saving of one or two characters really worth these
deviations from the kind-of established naming scheme? Its hard
to remember all these minor differences in my opinion.

Hm, you mean the name "mangle" or the name of the option "--mangle-mac-d"? This is what we currently have in kernel mainline and arptables userspace, it's not my fault :). I can send you a patch to fix it with a consistent naming without breaking backward compatibility both in kernel and user-space.

arptables -I OUTPUT -o eth2 --h-length 6 \
    -j mangle --mangle-mac-s 01:00:5e:00:01:02
arptables -I INPUT -i eth2 --h-length 6 \
    --destination-mac 01:00:5e:00:01:02 \
    -j mangle --mangle-mac-d 00:zz:yy:xx:5a:27

In the case of TCP connections, pickup facility has to be disabled
to avoid marking TCP ACK packets coming in the reply direction as
valid.

echo 0 > /proc/sys/net/netfilter/nf_conntrack_tcp_loose

I'm not sure I understand this. You *don't* want to mark them
as valid, and you need to disable pickup for this?

If TCP pickup is enabled, one TCP ACK packet coming in the reply direction enters TCP ESTABLISHED state. Since that's a valid state-transition, the cluster match will consider that this is part of a connection that this node is handling since it's a valid state-transition. The cluster match does not mark packets that trigger invalid state transitions.

Unrelated to this patch, but maybe the target would also be
better named "NAT" instead of the much more generic term "mangle".
Why is it using lower case letters btw?

No idea who has done this, but I can send you a patch to fix this naming without breaking backward.

The match also provides a /proc entry under:

/proc/sys/net/netfilter/cluster/$PROC_NAME

where PROC_NAME is set via --cluster-proc-name. This is useful to
include possible cluster reconfigurations via fail-over scripts.
Assuming that this is the node 1, if node 2 is down, you can add
node 2 to your node-mask as follows:

echo +2 > /proc/sys/net/netfilter/cluster/$PROC_NAME

Does this provide anything you can't do by replacing the rule
itself?

Yes, the nodes in the cluster are identifies by an ID, the rule allows you to specify one ID. Say you have two cluster nodes, one with ID 1, and the other with ID 2. If the cluster node with ID 1 goes down, you can echo +1 to node with ID 2 so that it will handle packets going to node with ID 1 and ID 2. Of course, you need conntrackd to allow node ID 2 recover the filtering.

Now, I see that there is a possible optimization that consists of checking if one node has its node mask all set with regards to the total number of nodes, so that hashing can be skipped. But that's something that we can add later I think.

--
"Los honestos son inadaptados sociales" -- Les Luthiers
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Netfitler Users]     [LARTC]     [Bugtraq]     [Yosemite Forum]

  Powered by Linux