Search Linux Wireless

Mesh 802.11s blackhole due to bogus mpath routes with nexthop 00:00:00:00:00:00

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi everyone,

I want to report an issue I am having and ask for guidance on how to
collect more technical information that can help in understanding why
this happens and how to fix it (instead of working around it with
scripts that reload the WiFi when the issue appears).

Kernel in use: 5.4.155
Linux distribution: OpenWrt
Target: confirmed on a mediatek device from Winstars using drivers
mt7603e, mt7615e, Netgear EA8500 and Netgear EX6400
The router device I am using has 2 CPUs.

Steps to reproduce: I am not sure what exactly triggers this, the bug
happens on its own periodically, if anyone has suggestions on specific
actions to do to try to replicate it, please let me know

What happens:

The connection to the root mesh node is lost, but inspecting the
status of the mesh links with “iw mesh0 station dump” or “iw mesh1
station dump” shows the links are active.

Inspecting “iw mesh0 mpath dump” or “iw mesh1 mpath dump” show a list
of mac addresses which are from devices in the LAN, with an invalid
next hop (00:00:00:00:00:00), which for some reason end up in the mesh
routing table and fill it.

For example, when the problem starts showing up, the mesh routing
table may look as follows:

iw mesh1 mpath dump
DEST ADDR         NEXT HOP          IFACE SN METRIC QLEN EXPTIME DTIM
DRET FLAGS HOP_COUNT PATH_CHANGE
16:dd:0c:a4:ba:aa 00:00:00:00:00:00 mesh1 0 0 0 0 1600 4 0x0 0 0
fc:93:c3:3b:0b:fe 00:00:00:00:00:00 mesh1 0 0 0 0 1600 4 0x0 0 0
90:f4:c0:8f:de:80 00:00:00:00:00:00 mesh1 0 0 0 0 1600 4 0x0 0 0
bc:a1:da:cb:87:a8 00:00:00:00:00:00 mesh1 0 0 0 0 1600 4 0x0 0 0
1e:f7:95:47:4a:b3 00:00:00:00:00:00 mesh1 0 0 0 0 1600 4 0x0 0 0
3a:e2:e6:88:65:fb 00:00:00:00:00:00 mesh1 0 0 0 0 1600 4 0x0 0 0
6c:cd:48:37:af:bc 00:00:00:00:00:00 mesh1 0 0 0 0 1600 4 0x0 0 0
d8:54:0b:7c:20:46 00:00:00:00:00:00 mesh1 0 0 0 0 1600 4 0x0 0 0
ce:43:28:84:44:7e 00:00:00:00:00:00 mesh1 0 0 0 0 1600 4 0x0 0 0
26:75:58:0b:39:18 00:00:00:00:00:00 mesh1 0 0 0 0 1600 4 0x0 0 0
80:3f:5d:**:**:** 80:3f:5d:**:**:** mesh1 0 4857 0 0 0 0 0x10 1 1

After one minute, the size may have doubled or tripled.

At some point one of the mesh nodes ends up in the routing table with
a bogus route (with nexthop 00:00:00:00:00:00), which basically screws
up routing 100%, until that happens, the other rough mesh routes do
not cause issues, but once the black hole appears, removing the bogus
mesh routes does not fix it, only turning off wifi and then on again
fixes it.

I would be grateful for any suggestion on how to collect more useful
information that can help to track down and fix this bug.

Best regards
Federico Capoano
OpenWISP Project




[Index of Archives]     [Linux Host AP]     [ATH6KL]     [Linux Wireless Personal Area Network]     [Linux Bluetooth]     [Wireless Regulations]     [Linux Netdev]     [Kernel Newbies]     [Linux Kernel]     [IDE]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Hiking]     [MIPS Linux]     [ARM Linux]     [Linux RAID]

  Powered by Linux