I've been fighting a kernel bug that is producing random crashes around network / skb_layer for a long time and was able to isolate it (or one of its components) to the br_netfilter module. I am reproducing the bug with PowerPC (TL-WDR4900v1.3) and MIPS (DB120, ar71xx) based systems. Florian Westphal did not see it on kvm/x86, it is unclear whether this requires a physical system or is CPU specific. This bug is in the latest OpenWRT (tested HEAD is 03b15ae9), as it happens with firmwares built 2+ years ago, so it is no current regression but something that was there for a long time. Reproducing the crash 1. build the firmware for the system to test * use default configuration * ensure to select CONFIG_BRIDGE_NETFILTER in kernel_menuconfig 2. boot the device and access it over serial 3. ensure br-lan bridge has at least two active ports * tested with ath9k + Ethernet (gianfar and ag71xx) * if not enabled, enable radio0 and ensure wlan0 is in bridge 4. run: sysctl -w net.bridge.bridge-nf-call-iptables=1 5. from your host, continuously ping the device over Ethernet 6. run: ifconfig br-lan down The next ingress packet causes a fatal crash. Trace logs for MIPS and PPC are attached and hint to __nf_conntrack_confirm Let me know if I could provide more information to further isolate the problem. Thanks, Zefir
[ 191.321163] br-lan: port 1(eth0.1) entered disabled state [ 192.646656] CPU 0 Unable to handle kernel paging request at virtual address 00200200, epc == 87000670, ra == 870018f4 [ 192.657446] Oops[#1]: [ 192.659761] CPU: 0 PID: 0 Comm: swapper Not tainted 4.1.16 #1 [ 192.665593] task: 803ce958 ti: 803c8000 task.ti: 803c8000 [ 192.671069] $ 0 : 00000000 00000000 80000001 00200200 [ 192.676410] $ 4 : 86c0fa20 00000001 00000000 a44465b9 [ 192.681742] $ 8 : 86c0fa78 86c0fa78 00000000 00000000 [ 192.687075] $12 : 115f0002 00000000 00000000 c0a80114 [ 192.692408] $16 : 86c0fa20 000006cc 000007b6 803e5af0 [ 192.697742] $20 : 000006cc 00000004 803e5af0 00000000 [ 192.703082] $24 : 00000000 871367d4 [ 192.708416] $28 : 803c8000 803c9a28 86c0fa60 870018f4 [ 192.713750] Hi : 000007b6 [ 192.716670] Lo : b5a74800 [ 192.719628] epc : 87000670 nf_conntrack_find_get+0x68/0x88 [nf_conntrack] [ 192.726698] ra : 870018f4 __nf_conntrack_confirm+0xc0/0x364 [nf_conntrack] [ 192.733927] Status: 1100fc03 KERNEL EXL IE [ 192.738196] Cause : 8080000c [ 192.741117] BadVA : 00200200 [ 192.744040] PrId : 0001974c (MIPS 74Kc) [ 192.748015] Modules linked in: ath9k ath9k_common pppoe ppp_async iptable_nat ath9k_hw ath pppox ppp_generic nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 mac80211 ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nan [ 192.816284] Process swapper (pid: 0, threadinfo=803c8000, task=803ce958, tls=00000000) [ 192.824311] Stack : 87342240 87135744 00000001 02000000 803c9aac 803c9aec 803cabac 87342240 00000001 00000004 00000003 ffffff62 00000000 8026efb0 86c0fa20 87342240 00000000 8731f000 86c18100 87137058 00000000 87342240 803c9aec 87342240 803cab24 fffffffb 00000001 8026f090 8734ca80 87865b7c 00000000 00000008 00000000 87137058 803caba4 87342240 00000001 8731f000 87342240 8731f05c ... [ 192.860643] Call Trace: [ 192.863133] [<87000670>] nf_conntrack_find_get+0x68/0x88 [nf_conntrack] [ 192.869850] [ 192.871356] Code: 00020336 8c820008 30450001 <14a00002> ac620000 ac430004 3c020020 24420200 ac82000c [ 192.881512] ---[ end trace 1e716eb17e40af8b ]--- [ 192.888247] Kernel panic - not syncing: Fatal exception in interrupt [ 192.895654] Rebooting in 3 seconds..
[ 69.834129] br0: port 3(eth1) entered disabled state [ 69.835427] br0: port 1(wlan0) entered disabled state [ 77.493530] Unable to handle kernel paging request for data at address 0x00200200 [ 77.495415] Faulting instruction address: 0xd32ce874 [ 77.496669] Oops: Kernel access of bad area, sig: 11 [#1] [ 77.498027] DT50 [ 77.498493] Modules linked in: ath9k ath9k_common iptable_nat ath9k_hw ath nf_nat_ipv4 nf_conntrack_ipv4 mac80211 ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_tcpmss xt_string xt_statistic xt_state xt_recent xt_quota xt_pkh [ 77.522830] CPU: 0 PID: 0 Comm: swapper Not tainted 3.18.23 #10 [ 77.524323] task: c035b300 ti: cffe6000 task.ti: c0370000 [ 77.525684] NIP: d32ce874 LR: d32cffec CTR: d35b13c4 [ 77.526936] REGS: cffe7c60 TRAP: 0300 Not tainted (3.18.23) [ 77.528403] MSR: 00029000 <CE,EE,ME> CR: 42002082 XER: 20000000 [ 77.529951] DEAR: 00200200 ESR: 00800000 GPR00: c762b218 cffe7d10 c035b300 c762b1c0 8db8d32d d72044d0 00000000 00000000 GPR08: 00000001 80000001 00200200 332f4b8b 22002082 10025420 00200000 c7654080 GPR16: c7504d80 c7b7e540 cfba4678 c7b4a000 000086dd 00000000 80000000 00000002 GPR24: c762b200 00000e25 000002b1 c0366fc8 00000225 000006b1 00000000 c762b1c0 [ 77.538144] NIP [d32ce874] 0xd32ce874 [ 77.539075] LR [d32cffec] __nf_conntrack_confirm+0x2c8/0x34c [nf_conntrack] [ 77.540826] Call Trace: [ 77.541449] [cffe7d10] [d35b9760] nf_nat_ipv4_fn+0x13c/0x200 [nf_nat_ipv4] (unreliable) [ 77.543478] [cffe7d40] [c022640c] nf_iterate+0x70/0xc0 [ 77.544778] [cffe7d80] [c02264d8] nf_hook_slow+0x7c/0x124 [ 77.546143] [cffe7dc0] [c022c914] ip_local_deliver+0x98/0xbc [ 77.547578] [cffe7dd0] [c01eee60] __netif_receive_skb_core+0x668/0x79c [ 77.549228] [cffe7e30] [c01f0d84] netif_receive_skb_internal+0x60/0x84 [ 77.550884] [cffe7e50] [c0285784] br_handle_frame+0x21c/0x32c [ 77.552337] [cffe7e70] [c01eece0] __netif_receive_skb_core+0x4e8/0x79c [ 77.553985] [cffe7ed0] [c01f0d84] netif_receive_skb_internal+0x60/0x84 [ 77.555638] [cffe7ef0] [d328c764] gfar_clean_rx_ring+0x39c/0x2b7c [gianfar_driver] [ 77.557551] [cffe7f40] [d328c9cc] gfar_clean_rx_ring+0x604/0x2b7c [gianfar_driver] [ 77.559462] [cffe7f60] [c01f100c] net_rx_action+0x74/0x188 [ 77.560857] [cffe7f90] [c0024508] __do_softirq+0xa8/0x1a8 [ 77.562222] [cffe7fe0] [c00247f4] irq_exit+0x4c/0x64 [ 77.563479] [cffe7ff0] [c000c198] call_do_irq+0x24/0x3c [ 77.564802] [c0371e80] [c0004280] do_IRQ+0x74/0xb0 [ 77.566016] [c0371ea0] [c000d7a8] ret_from_except+0x0/0x18 [ 77.567408] --- interrupt: 501 at arch_cpu_idle+0x24/0x60 [ 77.567408] LR = arch_cpu_idle+0x24/0x60 [ 77.569849] [c0371f60] [c004ddbc] rcu_idle_enter+0x80/0xa8 (unreliable) [ 77.571529] [c0371f70] [c0042e88] cpu_startup_entry+0xec/0x218 [ 77.573005] [c0371fb0] [c0337980] start_kernel+0x304/0x318 [ 77.574390] [c0371ff0] [c0000394] set_ivor+0x120/0x15c [ 77.575685] Instruction dump: [ 77.576439] 5484703e 7d445050 7d434a78 554ac03e 7c6a1850 4e800020 8143000c 7d490034 [ 77.578424] 5529d97e 0f090000 81230008 71280001 <912a0000> 40820008 91490004 3d200020 [ 77.580453] ---[ end trace d093fabfbc25455c ]--- [ 77.582973] [ 78.573305] Kernel panic - not syncing: Fatal exception in interrupt [ 78.883261] mtdoops: ready 215, 216 (no erase) [ 78.884382] Rebooting in 3 seconds..