Re: Running an active/active firewall/router (xt_cluster?)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey Oliver,
That is exactly right, and also the suggestion of using a switch stack
is also a redundancy thing because if one switch has a hardware
failure the other switch will still route data.
there is the additional possibility of trunking 2 interfaces across
the two switches (1 to each) in the stack which means if one of your
firewalls fails over 1 firewall could handle the full 20Gbps traffic
across the 2 10Gbps interfaces.

Also on a side note at the time we had chosen Avaya primarily for
latency reasons not price but when dealing with over 100 firewalls in
a mission critical environment the price thing was a nice for the
budget also being one of their biggest clients at the time we had
leverage with their dev team to get them to prioritize fixing our
issues :). that they are legitimately good switches that are under
rated with some cool features their shortest path bridging stuff is
really awesome for large scale networks.

On Tue, May 11, 2021 at 8:25 AM Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote:
>
> Hi Oliver,
>
> On Tue, May 11, 2021 at 11:28:23AM +0200, Oliver Freyermuth wrote:
> > Hi Pablo,
> >
> > a short additional question after considering this for a while longer:
> >
> > Am 11.05.21 um 00:58 schrieb Oliver Freyermuth:
> > > > > [...]
> > > > > Basic tests show that this works as expected, but the details get messy.
> > > > >
> > > > > 1. Certainly, conntrackd is needed to synchronize connection states.
> > > > >     But is it always "fast enough"?  xt_cluster seems to match by the
> > > > >     src_ip of the original direction of the flow[0] (if I read the code
> > > > >     correctly), but what happens if the reply to an outgoing packet
> > > > >     arrives at both firewalls before state is synchronized?
> > > >
> > > > You can avoid this by setting DisableExternalCache to off. Then, in
> > > > case one of your firewall node goes off, update the cluster rules and
> > > > inject the entries (via keepalived, or your HA daemon of choice).
> > > >
> > > > Recommended configuration is DisableExternalCache off and properly
> > > > configure your HA daemon to assist conntrackd. Then, the conntrack
> > > > entries in the "external cache" of conntrackd are added to the kernel
> > > > when needed.
> > >
> > > You caused a classic "facepalming" moment. Of course, that will solve (1)
> > > completely. My initial thinking when disabling the external cache
> > > was before I understood how xt_cluster works, and before I found that it uses the direction
> > > of the flow, and then it just escaped my mind.
> > > Thanks for clearing this up! :-)
> >
> > Thinking about this, the conntrack synchronization requirements
> > would essentially be "zero", since after a flow is established, it
> > stays on the same machine, and conntrackd synchronization is only
> > relevant on failover — right?
>
> Well, you have to preventively synchronize states because you do not
> know when your router will become unavailable, so one of the routers
> in your pool takes over flows, right? So it depends on whether there
> are HA requirements on your side for the existing flows.
>
> > So this approach would not limit / reduce the achievable bandwidth,
> > since the only ingredient are the mangling filters — so in case we
> > can't go for dynamic routing with Quagga and hardware router stacks,
> > this could even be a solution for high bandwidths?
>
> I think so, yes. However, note that you're spending cycles to drop
> packets that your node does not own though.
>
> In case you have HA requirements, there is a number of trade-offs you
> can apply to reduce the synchronization workload, for example, only
> synchronize TCP established connections to reduce the amount of
> messages between the two routers. There is also tuning that your could
> explore: You could play with affinity to pin conntrackd into a CPU
> core which is *not* used to handle NIC interruptions. IIRC, there is
> -j CT action in iptables that allows to filter the netlink events that
> are sent to userspace conntrackd (e.g. you could just send events for
> "ct status assured" flows to userspace).




[Index of Archives]     [Linux Netfilter Development]     [Linux Kernel Networking Development]     [Netem]     [Berkeley Packet Filter]     [Linux Kernel Development]     [Advanced Routing & Traffice Control]     [Bugtraq]

  Powered by Linux