Hi, Here's a quick email trying to summarize the issue: Context: I recently setup an openvpn bridge between a router and a remote machine, the router running a patched (grsec, ja1, imq2, a couple from patch-o-matic-ng and ebtables-brnf taken from http://prdownloads.sourceforge.net/ebtables/ebtables-brnf-11-2_vs_2.4.31.diff.gz) 2.4.36.6 kernel. (This machine has 192MB RAM, and it had been up and running for over 400 days before I setup the bridge, with the exact same kernel / iptables rules). Then I set up a couple ebtables rules to prevent DHCP contamination across the bridge. Each side of the bridge has its own default gw to the internet, it happens that on one side, this gateway is the above mentioned router, which thus sees some traffic (small LAN NATd on a DSL line). Issue: Then after a fairly short while after setting up the bridge (a matter of hours mostly, the more traffic on the router the quicker - especially quick when one of the machines behind the router was doing P2P for instance), the 2.4.36-running router would suddenly disappear from the network, though the machine was still alive and running. Trying to ping from the router would yield a "ping: sendto: no buffer space available", and I noticed in the logs and dmesg a fair amount of "dst cache overflow" messages. Forensics: Simply increasing the value of /proc/sys/net/ipv4/route/max_size brought the networking back to life, but only for another short while (mostly depending on the size of the new value). A quick check with Robur's rtstat utility (fetched from ftp://robur.slu.se/pub/Linux/net-development/rt_cache_stat/rtstat) basically showed that the rt cache was steadily growing and practically never came back down, hitting the max_size value and eventually effectively killing networking on the machine. It was anyway totally out of touch with the number of entries as reported by "ip route show cache". And nothing (echo 1 > /proc/sys/net/ipv4/route/flush, ip route flush cache, flushing iptables/ebtables rules, bringing iface down/up, etc) would get it back to normal. Clearly, there was a leak. Fix(es): As I was quite on a hurry (I'm taking off on vacation in a few hours ;) I dug around on google, found a couple related reports and fixes mostly for linux 2.6. I thus tried to "adapt" them to 2.4.36.6, but as I said I was on a rush so I didn't really know what I was doing. Still, the following patch (patching the ip_fragment() function, as I first thought this was a fragmentation-related leak): diff -ru linux-2.4.36.6.prev/net/ipv4/ip_output.c linux/net/ipv4/ip_output.c --- linux-2.4.36.6.prev/net/ipv4/ip_output.c 2008-06-06 18:25:34.000000000 +0200 +++ linux/net/ipv4/ip_output.c 2008-08-14 21:39:23.000000000 +0200 @@ -844,6 +844,7 @@ if (skb->sk) skb_set_owner_w(skb2, skb->sk); + dst_release(skb2->dst); skb2->dst = dst_clone(skb->dst); skb2->dev = skb->dev; improved things a little bit. Really not much actually, but the rt cache size figure did come back down a bit more often than before (or so it seemed). Though the trend remained toward a steady, unstoppable growth. But then, this patch, backported from http://www.ssi.bg/~ja/brnf_dst-2.6.20-1.diff, patching br_nf_pre_routing_finish(): diff -ru linux-2.4.36.6.prev/net/bridge/br_netfilter.c linux/net/bridge/br_netfilter.c --- linux-2.4.36.6.prev/net/bridge/br_netfilter.c 2008-06-06 18:25:34.000000000 +0200 +++ linux/net/bridge/br_netfilter.c 2008-08-15 00:17:56.000000000 +0200 @@ -216,6 +216,10 @@ skb->nf_debug ^= (1 << NF_BR_PRE_ROUTING); #endif + /* Old skb->dst is not expected, it is lost in all cases */ + dst_release(skb->dst); + skb->dst = NULL; + if (nf_bridge->mask & BRNF_PKT_TYPE) { skb->pkt_type = PACKET_OTHERHOST; nf_bridge->mask ^= BRNF_PKT_TYPE; really made the whole issue go away. Note: as my kernel is quite heavily patched already, the above may not apply perfectly cleanly. Anyway, given at least the second patch "fixed" the problem as far as I'm concerned (the first one might not be necessary, didn't check that yet), I believe that there's a bug in the current ebtables_brnf 2.4 patch and a nasty one at that, since it DOS the machine fairly quickly. Hope the above explanation/patch make things clear enough and are anyhow helpful, again, I don't really know what I'm doing ;-) (I've attached both patches to this email in case gmail gets too smart with line wrapping) HTH T-Bone PS: please CC-me on replies, I'm not subscribed, and I'll be offline for the next 4 days. -- Thibaut VARENE http://www.parisc-linux.org/~varenet/
Attachment:
dst_overflow.patch
Description: Binary data