Hi Josua, On Thu, Jan 10, 2019 at 1:49 PM Josua Mayer <josua.mayer97@xxxxxxxxx> wrote: > > Good day once more, > > I have now identified a chain of calls leading up to the described > issue, along with a work-around: > > The first step was discovering fishy debug messages in dmesg: > [ 235.505517] Connecting to first module while both are powered > [ 237.394085] ifindex 5 peer bdaddr 00:b1:fc:8c:6e:47 type 1 my addr > a4:d5:78:11:cf:6f type 1 > [ 241.872089] dest IP fe80::b1:fcff:fe8c:6e47 > [ 241.872104] peers 1 addr fe80::b1:fcff:fe8c:6e47 rt (null) > [ 241.872124] xmit bt0 to 00:b1:fc:8c:6e:47 type 1 IP 6800::600 chan > (ptrval) > [ 242.356387] Connecting to second module > [ 244.073592] dest IP fe80::b1:fcff:fe8c:6e47 > [ 244.073603] peers 2 addr fe80::b1:fcff:fe8c:6e47 rt (null) > [ 244.073607] no such peer > > We can see here how the first module is connected, and packets to its > link-local address are being transmitted. > Then the second module is connected and the number of peers updates to 2. > Now we have another packet for the first modules link-local address. We > know there are 2 peers, but for some reason we get a message saying "no > such peer"! > > Luckily this message was easy to trace: > See net/bluetooth/6lowpan.c:setup_header > This message is a direct result of a previous call to peer_lookup_dst > returning null. > Now while reviewing peer_lookup_dst, keep in mind that we are looking > for a difference in behaviour when there is one, and when tehre are at > least two peers. > Let me just quote here: > if (count == 1) { > peer = list_first_or_null_rcu(&dev->peers, struct lowpan_peer, list); > return peer; > } > > If there is only one peer, no checks are performed at all, it is simply > assumed that this peer mist be the one to receive packets for the given > address. > So this is the one peers, or one module connected case - which works > just fine. > > Then follows a curious case that I do not fully understand: > if no route is known, and no gateway was specified in packet data, do > not even search for the right peer, simply return 0: > if (!rt) { > nexthop = &lowpan_cb(skb)->gw; > > if (ipv6_addr_any(nexthop)) > return NULL; > } > ^^ I believe this decision is wrong. > There might be neither route nor gateway, if the destination is a peer. > > I have come up with the following work-around: > - nexthop = &lowpan_cb(skb)->gw; > - > - if (ipv6_addr_any(nexthop)) > - return NULL; > + if (ipv6_addr_any(&lowpan_cb(skb)->gw)) { > + /* There is neither route nor gateway, > + * probably the destination is a direct peer. > + */ > + nexthop = daddr; > + } else { > + /* There is a known gateway > + */ > + nexthop = &lowpan_cb(skb)->gw; > + } > I am submitting this patch as separately as: > [RFC] bluetooth_6lowpan: search for destination address in all peers > It is by no means finished and meant to illustrate the core issue, and > allow for a discussion around the control logic, and purpose of > Please comment if I have understood the purpose of the peer_lookup_dst > function. > I might even suggest removing the special handling of one peer ... . I like this version better but apparently the patch you have sent is only matching part of the address, not sure why you had refactored that. If I recall the reason why peer_lookup_dst exists is that we need to resolve the channel where to send the packets. > Yours sincerely > Josua Mayer > > > Am 08.01.19 um 19:57 schrieb Josua Mayer: > > Greetings everybody, > > > > I want to present to you an issue I am having the 6LoWPAN over BLE > > facility in the kernel. > > I have reached the point where I don't know where, what and how to debug > > the situation and am hoping for some advice here: > > > > First an overview of the setup: > > 1. an SBC with BLE capable Bluetooth chip > > 2. multiple Nordic nRF52840 modules > > > > This is the problem I have observed: > > 1. One Nordic module is powered - SBC connects to it > > --> ping6 works flawlessly till the module restarts > > --> communication with the remote server works as expected > > > > 2. Two Nordic modules are powered - SBC connects only to one at a time > > --> ping6 works flawlessly till the module restarts > > --> communication with the remote server works as expected > > > > 3. Two Nordic modules are powered - SBC connects to both > > --> ping6 receives no more replies as soon as the second module is connected > > --> communication to the remote server stops as soon as the second > > module is connected > > > > Test Case: > > rfkill unblock 0 > > modprobe bluetooth_6lowpan > > echo -n 'module bluetooth_6lowpan +p' > > > /sys/kernel/debug/dynamic_debug/control > > > > while true; do ping6 -c 1 -I bt0 fe80::b1:fcff:fe8c:6e47 || true; sleep > > 1; done > > > > echo "Connecting to first module while both are powered" > /dev/kmsg > > echo "connect 00:B1:FC:8C:6E:47 1" > > > /sys/kernel/debug/bluetooth/6lowpan_control > > # sit back and watch pings till module restarts > > echo "Connecting to first module while both are powered" > /dev/kmsg > > echo "connect 00:B1:FC:8C:6E:47 1" > > > /sys/kernel/debug/bluetooth/6lowpan_control > > # wait till first ping goes through > > echo "Connecting to second module" > /dev/kmsg > > echo "connect 00:39:D3:29:92:1C 1" > > > /sys/kernel/debug/bluetooth/6lowpan_control > > # Expected: ping6 continues to receive replies > > # Actual result: ping6 times out > > > > Please see attached dmesg.log from this test case, with dynamic > > debugging enabled for module bluetooth_6lowpan. > > > > The Nordic modules are programmed to advertise themselves for > > establishing a connection; Then they start communication with a server > > on the internet over ipv6. Finally they are rebooted by a watchdog. > > While a module is connected, it can be pinged by its link-local address > > which is derived from its MAC address and thereby known. > > > > As you may have noticed I just wrote "SBC" above. > > That is because I have done this experiment with 3 different SBCs: > > 1. SolidRun HummingBoard with i.MX6 uSOM Revision 1.5 > > features Ti WL18MODGB combined WiFi and Bluetooth module > > - linux-image-4.20.0-trunk-armmp_4.20-1~exp2_armhf.deb > > (+BT_HCIUART=m, +BT_HCIUART_LL=y, +DYNAMIC_DEBUG=y) > > ^^ This system was used to produce the attached dmesg.log > > > > 2. RaspberryPi 3B > > 3. RaspberryPi 3B+ > > - rpi-4.15.y (from their github) > > - rpi-4.16.y (from their github) > > - rpi-4.17.y (from their github) > > - rpi-4.18.y (from their github) > > - rpi-4.19.y (from their github) > > - rpi-4.20.y (from their github) > > bcm2709_defconfig > > zImage modules dtbs -j12 > > gcc-linaro-7.3.1-2018.05-x86_64_arm-linux-gnueabihf > > > > rpi-4.14.y suffers from a busy kworker (Workqueue: hci0 hci_rx_work > > [bluetooth]) making tests difficult. > > > > Supposedly back in 4.4.8-v7 on raspberrypi this issue with multiple > > peers did not exist, while the busy kworker did pop up after time > > requring a reboot. > > I did not verify or test with that rather old version yet. > > Would it be a good idea to start from that 4.4.8 rpi fork working up to > > 4.15 to find the place where it broke? I feel like this kind of work is > > difficult > > when forks are involved. > > > > Are there any components of the kernel in particular that could be verified > > in order to figure out what is going wrong? > > > > > > Yours sincerely > > Josua Mayer > > -- Luiz Augusto von Dentz