Bridge deadlock in 2.4.33

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I hope this is the right list for this.
I recently upgraded from 2.4.20 to 2.4.33 both with the ebtables patch
running an SMP kernel on a UP box and now I have the following problem:
My box has some rules in ebtables and iptables and a bridge with 3
ports. When I try to remove eth2, which is connected to other bridges,
from it the box hangs. I used KDB to catch the trace and got this:

nf_hook_slow+0x75
[bridge]br_send_bpdu+0x19f
[bridge]br_send_config_bpdu+0x189
[bridge]br_transmit_config+0xca
[bridge]br_config_bpdu_generation+0x45
[bridge]br_become_root_bridge+0x48
[bridge]br_stp_disable_port+0x9a
[bridge]__br_del_if+0x3e
[bridge]br_del_if+0x47
[bridge]br_ioctl_device+0x59
[bridge]br_ioctl+0x66
[bridge]br_dev_do_ioctl+0x91
dev_ifsioc+0x420
dev_ioctl+0x262
inet_ioctl+0x1d3
sock_ioctl+0x3f
sys_ioctl+0x104
system_call+0x33

I traced the problem to nf_hook_slow() trying to get a read lock on
BR_NETPROTO_LOCK but br_del_if() already gets a write lock earlier in
the stack.
I also checked and in 2.4.20 br_send_bpdu() called dev_queue_xmit()
directly and now it goes through netfilter.
I wrote this small patch just to see what will happen:

--- netfilter.c    2006-10-29 18:55:16.000000000 +0200
+++ netfilter.c.new    2006-10-29 18:55:09.000000000 +0200
@@ -486,7 +486,10 @@
   }

   /* We may already have this, but read-locks nest anyway */
-    br_read_lock_bh(BR_NETPROTO_LOCK);
+    if (spin_is_locked(&__br_write_locks[BR_NETPROTO_LOCK].lock))
+        printk(KERN_ERR "nf_hook_slow: BR_NETPROTO_LOCK already locked.\n");
+    else
+        br_read_lock_bh(BR_NETPROTO_LOCK);

#ifdef CONFIG_NETFILTER_DEBUG
   if (unlikely((*pskb)->nf_debug & (1 << hook))) {
@@ -509,7 +512,8 @@
       nf_queue(*pskb, elem, pf, hook, indev, outdev, okfn);
   }

-    br_read_unlock_bh(BR_NETPROTO_LOCK);
+    if (!spin_is_locked(&__br_write_locks[BR_NETPROTO_LOCK].lock))
+        br_read_unlock_bh(BR_NETPROTO_LOCK);
   return ret;
}

Now the kernel will not deadlock and everything seems ok except that
when I used brctl to add the interface again it says it can't enslave
the port because it already part of the bridge and if I try to delete it
again it says that the port is not part of the bridge, but after about
40 seconds everything normal again and the interface is no longer part
of the bridge.

I was wondering if this patch is ok as a workaround for this problem or
if there's a better solution.

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Netdev]     [Ethernet Bridging]     [Linux 802.1Q VLAN]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Git]     [Bugtraq]     [Yosemite News and Information]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux PCI]     [Linux Admin]     [Samba]

  Powered by Linux