Hi, This mail list provides me a lot of information about problem and I want share solution to bad refcount on bridge. Solution is applied on kernel https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f3abc9b963e004b8c96cd7fbee6fd905f2bfd620 commit f216f082b2b37c4943f1e7c393e2786648d48f6f ([NETFILTER]: bridge netfilter: deal with martians correctly) added a refcount leak on in_dev. Instead of using in_dev_get(), we can use __in_dev_get_rcu(), as netfilter hooks are running under rcu_read_lock(), as pointed by Patrick. diff --git a/net/bridge/br_netfilter.c
b/net/bridge/br_netfilter.c
index 4fde742..907a82e 100644 --- a/net/bridge/br_netfilter.c +++ b/net/bridge/br_netfilter.c @@ -359,7 +359,7 @@ static int
br_nf_pre_routing_finish(struct sk_buff *skb)
},
.proto = 0,
};
- struct in_device *in_dev = in_dev_get(dev);
+ struct in_device *in_dev = __in_dev_get_rcu(dev);
Best Regards, Jorge. > Two other recent reports are: > 1. Buggy applications that hold packets in their input queue forever, > and/or netfilters. The socket buffer's contain a reference for > packets in flight. that may be it, but I am not sure which queue you are talking about, but there is an application that is using the netfiler ip_queue to queue packets to user space. in this application, these packets can be held in user space for extended periods of time (up to 30/60 seconds), and then they are either dropped or released. Could this possibly be creating a problem? I don't believe that the system is using any of the VLAN code. > I have found an appearant leak of a route object, which holds a > reference > to a device. I reproduced in both 2.6.11 and 2.6.13 using 802.1Q > VLANs. > I have a patch that will print out the place of the leaked reference > against 2.6.13. > > http://www.candelatech.com/oss/rfcnt.patch > > Enable the feature in the Networking section of Kconfig. Ben, i will incorporate this patch and let you know if i turn up any results. thanks, --robert On Aug 31, 2005, at 9:37 PM, Stephen Hemminger wrote: > On Wed, 31 Aug 2005 19:04:01 -0700 > Robert Scott <rbscott at axentra.net> wrote: > > >> Hello, >> >> I know that this bug has been discussed before at length on this >> mailing list, but previous post seemed to indicate that it was fixed >> before kernel 2.6.12. I am still seeing this occasionally in kernel >> 2.6.12.3. The system is running knoppix, and IPV6 is not compiled >> into the kernel(other posts mentioned numerous problems with the IPV6 >> code). But every so often, when bringing down the bridge (it doesn't >> happen every time), the process hangs, and the following message >> appears in dmesg repeatedly: >> >> 'unregister_netdevice: waiting for br0 to become free. Usage count >> = 1' >> >> None of the processes involved can be killed, and an attempt to run >> an ifconfig results in a process that is also waiting forever. At >> this point the box must be rebooted forcefully. >> >> Two questions. >> 1. In a previous post, someone mentioned one solution was to >> commenting out the check that is hanging in the kernel. Does this >> check preventing something terrible from happening(i assumed that it >> does), or is it safe to remove it >> > > Really bad idea, because if the thing that is holding the reference > like packets stuck in some dead queue, ever get processed the kernel > will die. > > >> 2. Any ideas of something to try in order to make this repeatable? >> > > Two other recent reports are: > 1. Buggy applications that hold packets in their input queue forever, > and/or netfilters. The socket buffer's contain a reference for > packets in flight. > > 2. The VLAN code had a number of reference bugs, if you look through > recent netdev mailing list you will see the discussion. > |