Re: Multiple peers with bluetooth_6lowpan

Josua Mayer <josua.mayer97@xxxxxxxxx> · Thu, 10 Jan 2019 17:48:00 +0100

Good day once more,

I have now identified a chain of calls leading up to the described
issue, along with a work-around:

The first step was discovering fishy debug messages in dmesg:
[  235.505517] Connecting to first module while both are powered
[  237.394085] ifindex 5 peer bdaddr 00:b1:fc:8c:6e:47 type 1 my addr
a4:d5:78:11:cf:6f type 1
[  241.872089] dest IP fe80::b1:fcff:fe8c:6e47
[  241.872104] peers 1 addr fe80::b1:fcff:fe8c:6e47 rt   (null)
[  241.872124] xmit bt0 to 00:b1:fc:8c:6e:47 type 1 IP 6800::600 chan
(ptrval)
[  242.356387] Connecting to second module
[  244.073592] dest IP fe80::b1:fcff:fe8c:6e47
[  244.073603] peers 2 addr fe80::b1:fcff:fe8c:6e47 rt   (null)
[  244.073607] no such peer

We can see here how the first module is connected, and packets to its
link-local address are being transmitted.
Then the second module is connected and the number of peers updates to 2.
Now we have another packet for the first modules link-local address. We
know there are 2 peers, but for some reason we get a message saying "no
such peer"!

Luckily this message was easy to trace:
See net/bluetooth/6lowpan.c:setup_header
This message is a direct result of a previous call to peer_lookup_dst
returning null.
Now while reviewing peer_lookup_dst, keep in mind that we are looking
for a difference in behaviour when there is one, and when tehre are at
least two peers.
Let me just quote here:
if (count == 1) {
peer = list_first_or_null_rcu(&dev->peers, struct lowpan_peer, list);
return peer;
}

If there is only one peer, no checks are performed at all, it is simply
assumed that this peer mist be the one to receive packets for the given
address.
So this is the one peers, or one module connected case - which works
just fine.

Then follows a curious case that I do not fully understand:
if no route is known, and no gateway was specified in packet data, do
not even search for the right peer, simply return 0:
if (!rt) {
nexthop = &lowpan_cb(skb)->gw;

if (ipv6_addr_any(nexthop))
return NULL;
}
^^ I believe this decision is wrong.
There might be neither route nor gateway, if the destination is a peer.

I have come up with the following work-around:
-		nexthop = &lowpan_cb(skb)->gw;
-
-		if (ipv6_addr_any(nexthop))
-			return NULL;
+		if (ipv6_addr_any(&lowpan_cb(skb)->gw)) {
+			/* There is neither route nor gateway,
+			 * probably the destination is a direct peer.
+			 */
+			nexthop = daddr;
+		} else {
+			/* There is a known gateway
+			 */
+			nexthop = &lowpan_cb(skb)->gw;
+		}
I am submitting this patch as separately as:
[RFC] bluetooth_6lowpan: search for destination address in all peers
It is by no means finished and meant to illustrate the core issue, and
allow for a discussion around the control logic, and purpose of
Please comment if I have understood the purpose of the peer_lookup_dst
function.
I might even suggest removing the special handling of one peer ... .

Yours sincerely
Josua Mayer

Am 08.01.19 um 19:57 schrieb Josua Mayer:
> Greetings everybody,
> 
> I want to present to you an issue I am having the 6LoWPAN over BLE
> facility in the kernel.
> I have reached the point where I don't know where, what and how to debug
> the situation and am hoping for some advice here:
> 
> First an overview of the setup:
> 1. an SBC with BLE capable Bluetooth chip
> 2. multiple Nordic nRF52840 modules
> 
> This is the problem I have observed:
> 1. One Nordic module is powered - SBC connects to it
> --> ping6 works flawlessly till the module restarts
> --> communication with the remote server works as expected
> 
> 2. Two Nordic modules are powered - SBC connects only to one at a time
> --> ping6 works flawlessly till the module restarts
> --> communication with the remote server works as expected
> 
> 3. Two Nordic modules are powered - SBC connects to both
> --> ping6 receives no more replies as soon as the second module is connected
> --> communication to the remote server stops as soon as the second
> module is connected
> 
> Test Case:
> rfkill unblock 0
> modprobe bluetooth_6lowpan
> echo -n 'module bluetooth_6lowpan +p' >
> /sys/kernel/debug/dynamic_debug/control
> 
> while true; do ping6 -c 1 -I bt0 fe80::b1:fcff:fe8c:6e47 || true; sleep
> 1; done
> 
> echo "Connecting to first module while both are powered" > /dev/kmsg
> echo "connect 00:B1:FC:8C:6E:47 1" >
> /sys/kernel/debug/bluetooth/6lowpan_control
> # sit back and watch pings till module restarts
> echo "Connecting to first module while both are powered" > /dev/kmsg
> echo "connect 00:B1:FC:8C:6E:47 1" >
> /sys/kernel/debug/bluetooth/6lowpan_control
> # wait till first ping goes through
> echo "Connecting to second module" > /dev/kmsg
> echo "connect 00:39:D3:29:92:1C 1" >
> /sys/kernel/debug/bluetooth/6lowpan_control
> # Expected: ping6 continues to receive replies
> # Actual result: ping6 times out
> 
> Please see attached dmesg.log from this test case, with dynamic
> debugging enabled for module bluetooth_6lowpan.
> 
> The Nordic modules are programmed to advertise themselves for
> establishing a connection; Then they start communication with a server
> on the internet over ipv6. Finally they are rebooted by a watchdog.
> While a module is connected, it can be pinged by its link-local address
> which is derived from its MAC address and thereby known.
> 
> As you may have noticed I just wrote "SBC" above.
> That is because I have done this experiment with 3 different SBCs:
> 1. SolidRun HummingBoard with i.MX6 uSOM Revision 1.5
> features Ti WL18MODGB combined WiFi and Bluetooth module
> - linux-image-4.20.0-trunk-armmp_4.20-1~exp2_armhf.deb
> (+BT_HCIUART=m, +BT_HCIUART_LL=y, +DYNAMIC_DEBUG=y)
> ^^ This system was used to produce the attached dmesg.log
> 
> 2. RaspberryPi 3B
> 3. RaspberryPi 3B+
> - rpi-4.15.y (from their github)
> - rpi-4.16.y (from their github)
> - rpi-4.17.y (from their github)
> - rpi-4.18.y (from their github)
> - rpi-4.19.y (from their github)
> - rpi-4.20.y (from their github)
> bcm2709_defconfig
> zImage modules dtbs -j12
> gcc-linaro-7.3.1-2018.05-x86_64_arm-linux-gnueabihf
> 
> rpi-4.14.y suffers from a busy kworker (Workqueue: hci0 hci_rx_work
> [bluetooth]) making tests difficult.
> 
> Supposedly back in 4.4.8-v7 on raspberrypi this issue with multiple
> peers did not exist, while the busy kworker did pop up after time
> requring a reboot.
> I did not verify or test with that rather old version yet.
> Would it be a good idea to start from that 4.4.8 rpi fork working up to
> 4.15 to find the place where it broke? I feel like this kind of work is
> difficult
> when forks are involved.
> 
> Are there any components of the kernel in particular that could be verified
> in order to figure out what is going wrong?
> 
> 
> Yours sincerely
> Josua Mayer
>