[BUG][PATCH] dst cache overflow/leak in br_netfilter.c with ebtables-brnf-11-2_vs_2.4.31 and linux 2.4.36.6

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Here's a quick email trying to summarize the issue:

Context:
I recently setup an openvpn bridge between a router and a remote
machine, the router running a patched (grsec, ja1, imq2, a couple from
patch-o-matic-ng and ebtables-brnf taken from
http://prdownloads.sourceforge.net/ebtables/ebtables-brnf-11-2_vs_2.4.31.diff.gz)
2.4.36.6 kernel. (This machine has 192MB RAM, and it had been up and
running for over 400 days before I setup the bridge, with the exact
same kernel / iptables rules). Then I set up a couple ebtables rules
to prevent DHCP contamination across the bridge. Each side of the
bridge has its own default gw to the internet, it happens that on one
side, this gateway is the above mentioned router, which thus sees some
traffic (small LAN NATd on a DSL line).

Issue:
Then after a fairly short while after setting up the bridge (a matter
of hours mostly, the more traffic on the router the quicker -
especially quick when one of the machines behind the router was doing
P2P for instance), the 2.4.36-running router would suddenly disappear
from the network, though the machine was still alive and running.
Trying to ping from the router would yield a "ping: sendto: no buffer
space available", and I noticed in the logs and dmesg a fair amount of
"dst cache overflow" messages.

Forensics:
Simply increasing the value of /proc/sys/net/ipv4/route/max_size
brought the networking back to life, but only for another short while
(mostly depending on the size of the new value). A quick check with
Robur's rtstat utility (fetched from
ftp://robur.slu.se/pub/Linux/net-development/rt_cache_stat/rtstat)
basically showed that the rt cache was steadily growing and
practically never came back down, hitting the max_size value and
eventually effectively killing networking on the machine. It was
anyway totally out of touch with the number of entries as reported by
"ip route show cache". And nothing (echo 1 >
/proc/sys/net/ipv4/route/flush, ip route flush cache, flushing
iptables/ebtables rules, bringing iface down/up, etc) would get it
back to normal. Clearly, there was a leak.

Fix(es):
As I was quite on a hurry (I'm taking off on vacation in a few hours
;) I dug around on google, found a couple related reports and fixes
mostly for linux 2.6. I thus tried to "adapt" them to 2.4.36.6, but as
I said I was on a rush so I didn't really know what I was doing.
Still, the following patch (patching the ip_fragment() function, as I
first thought this was a fragmentation-related leak):

diff -ru linux-2.4.36.6.prev/net/ipv4/ip_output.c linux/net/ipv4/ip_output.c
--- linux-2.4.36.6.prev/net/ipv4/ip_output.c	2008-06-06 18:25:34.000000000 +0200
+++ linux/net/ipv4/ip_output.c	2008-08-14 21:39:23.000000000 +0200
@@ -844,6 +844,7 @@

 		if (skb->sk)
 			skb_set_owner_w(skb2, skb->sk);
+		dst_release(skb2->dst);
 		skb2->dst = dst_clone(skb->dst);
 		skb2->dev = skb->dev;


improved things a little bit. Really not much actually, but the rt
cache size figure did come back down a bit more often than before (or
so it seemed). Though the trend remained toward a steady, unstoppable
growth.

But then, this patch, backported from
http://www.ssi.bg/~ja/brnf_dst-2.6.20-1.diff, patching
br_nf_pre_routing_finish():

diff -ru linux-2.4.36.6.prev/net/bridge/br_netfilter.c
linux/net/bridge/br_netfilter.c
--- linux-2.4.36.6.prev/net/bridge/br_netfilter.c	2008-06-06
18:25:34.000000000 +0200
+++ linux/net/bridge/br_netfilter.c	2008-08-15 00:17:56.000000000 +0200
@@ -216,6 +216,10 @@
 	skb->nf_debug ^= (1 << NF_BR_PRE_ROUTING);
 #endif

+	/* Old skb->dst is not expected, it is lost in all cases */
+	dst_release(skb->dst);
+	skb->dst = NULL;
+
 	if (nf_bridge->mask & BRNF_PKT_TYPE) {
 		skb->pkt_type = PACKET_OTHERHOST;
 		nf_bridge->mask ^= BRNF_PKT_TYPE;


really made the whole issue go away.

Note: as my kernel is quite heavily patched already, the above may not
apply perfectly cleanly.

Anyway, given at least the second patch "fixed" the problem as far as
I'm concerned (the first one might not be necessary, didn't check that
yet), I believe that there's a bug in the current ebtables_brnf 2.4
patch and a nasty one at that, since it DOS the machine fairly
quickly. Hope the above explanation/patch make things clear enough and
are anyhow helpful, again, I don't really know what I'm doing ;-)

(I've attached both patches to this email in case gmail gets too smart
with line wrapping)

HTH

T-Bone

PS: please CC-me on replies, I'm not subscribed, and I'll be offline
for the next 4 days.

-- 
Thibaut VARENE
http://www.parisc-linux.org/~varenet/

Attachment: dst_overflow.patch
Description: Binary data


[Index of Archives]     [Netfitler Users]     [LARTC]     [Bugtraq]     [Yosemite Forum]

  Powered by Linux