Hi Luiz, Am 10.01.19 um 19:01 schrieb Luiz Augusto von Dentz: > Hi Josua, > > On Thu, Jan 10, 2019 at 1:49 PM Josua Mayer <josua.mayer97@xxxxxxxxx> wrote: >> >> Good day once more, >> >> I have now identified a chain of calls leading up to the described >> issue, along with a work-around: >> >> The first step was discovering fishy debug messages in dmesg: >> [ 235.505517] Connecting to first module while both are powered >> [ 237.394085] ifindex 5 peer bdaddr 00:b1:fc:8c:6e:47 type 1 my addr >> a4:d5:78:11:cf:6f type 1 >> [ 241.872089] dest IP fe80::b1:fcff:fe8c:6e47 >> [ 241.872104] peers 1 addr fe80::b1:fcff:fe8c:6e47 rt (null) >> [ 241.872124] xmit bt0 to 00:b1:fc:8c:6e:47 type 1 IP 6800::600 chan >> (ptrval) >> [ 242.356387] Connecting to second module >> [ 244.073592] dest IP fe80::b1:fcff:fe8c:6e47 >> [ 244.073603] peers 2 addr fe80::b1:fcff:fe8c:6e47 rt (null) >> [ 244.073607] no such peer >> >> We can see here how the first module is connected, and packets to its >> link-local address are being transmitted. >> Then the second module is connected and the number of peers updates to 2. >> Now we have another packet for the first modules link-local address. We >> know there are 2 peers, but for some reason we get a message saying "no >> such peer"! >> >> Luckily this message was easy to trace: >> See net/bluetooth/6lowpan.c:setup_header >> This message is a direct result of a previous call to peer_lookup_dst >> returning null. >> Now while reviewing peer_lookup_dst, keep in mind that we are looking >> for a difference in behaviour when there is one, and when tehre are at >> least two peers. >> Let me just quote here: >> if (count == 1) { >> peer = list_first_or_null_rcu(&dev->peers, struct lowpan_peer, list); >> return peer; >> } >> >> If there is only one peer, no checks are performed at all, it is simply >> assumed that this peer mist be the one to receive packets for the given >> address. >> So this is the one peers, or one module connected case - which works >> just fine. >> >> Then follows a curious case that I do not fully understand: >> if no route is known, and no gateway was specified in packet data, do >> not even search for the right peer, simply return 0: >> if (!rt) { >> nexthop = &lowpan_cb(skb)->gw; >> >> if (ipv6_addr_any(nexthop)) >> return NULL; >> } >> ^^ I believe this decision is wrong. >> There might be neither route nor gateway, if the destination is a peer. >> >> I have come up with the following work-around: >> - nexthop = &lowpan_cb(skb)->gw; >> - >> - if (ipv6_addr_any(nexthop)) >> - return NULL; >> + if (ipv6_addr_any(&lowpan_cb(skb)->gw)) { >> + /* There is neither route nor gateway, >> + * probably the destination is a direct peer. >> + */ >> + nexthop = daddr; >> + } else { >> + /* There is a known gateway >> + */ >> + nexthop = &lowpan_cb(skb)->gw; >> + } >> I am submitting this patch as separately as: >> [RFC] bluetooth_6lowpan: search for destination address in all peers >> It is by no means finished and meant to illustrate the core issue, and >> allow for a discussion around the control logic, and purpose of >> Please comment if I have understood the purpose of the peer_lookup_dst >> function. >> I might even suggest removing the special handling of one peer ... . > > I like this version better but apparently the patch you have sent is > only matching part of the address, not sure why you had refactored > that. If I recall the reason why peer_lookup_dst exists is that we > need to resolve the channel where to send the packets. Actually I didn't reafactor. The two patches actually handle 2 different issues. The first one deals with what I described in this topic; The second patch is very hackish and meant to illustrate the next step where we want to talk to dynamically assigned addresses. I believe the cache of known neighbours should be checked for the mac address, which in turn could be the search criteria for peers. ip -6 neighb 2004::39:d3ff:fe29:921c dev bt0 lladdr 00:39:d3:29:92:1c REACHABLE 2004::b1:fcff:fe8c:6e47 dev bt0 lladdr 00:b1:fc:8c:6e:47 REACHABLE > >> Yours sincerely >> Josua Mayer >> >> >> Am 08.01.19 um 19:57 schrieb Josua Mayer: >>> Greetings everybody, >>> >>> I want to present to you an issue I am having the 6LoWPAN over BLE >>> facility in the kernel. >>> I have reached the point where I don't know where, what and how to debug >>> the situation and am hoping for some advice here: >>> >>> First an overview of the setup: >>> 1. an SBC with BLE capable Bluetooth chip >>> 2. multiple Nordic nRF52840 modules >>> >>> This is the problem I have observed: >>> 1. One Nordic module is powered - SBC connects to it >>> --> ping6 works flawlessly till the module restarts >>> --> communication with the remote server works as expected >>> >>> 2. Two Nordic modules are powered - SBC connects only to one at a time >>> --> ping6 works flawlessly till the module restarts >>> --> communication with the remote server works as expected >>> >>> 3. Two Nordic modules are powered - SBC connects to both >>> --> ping6 receives no more replies as soon as the second module is connected >>> --> communication to the remote server stops as soon as the second >>> module is connected >>> >>> Test Case: >>> rfkill unblock 0 >>> modprobe bluetooth_6lowpan >>> echo -n 'module bluetooth_6lowpan +p' > >>> /sys/kernel/debug/dynamic_debug/control >>> >>> while true; do ping6 -c 1 -I bt0 fe80::b1:fcff:fe8c:6e47 || true; sleep >>> 1; done >>> >>> echo "Connecting to first module while both are powered" > /dev/kmsg >>> echo "connect 00:B1:FC:8C:6E:47 1" > >>> /sys/kernel/debug/bluetooth/6lowpan_control >>> # sit back and watch pings till module restarts >>> echo "Connecting to first module while both are powered" > /dev/kmsg >>> echo "connect 00:B1:FC:8C:6E:47 1" > >>> /sys/kernel/debug/bluetooth/6lowpan_control >>> # wait till first ping goes through >>> echo "Connecting to second module" > /dev/kmsg >>> echo "connect 00:39:D3:29:92:1C 1" > >>> /sys/kernel/debug/bluetooth/6lowpan_control >>> # Expected: ping6 continues to receive replies >>> # Actual result: ping6 times out >>> >>> Please see attached dmesg.log from this test case, with dynamic >>> debugging enabled for module bluetooth_6lowpan. >>> >>> The Nordic modules are programmed to advertise themselves for >>> establishing a connection; Then they start communication with a server >>> on the internet over ipv6. Finally they are rebooted by a watchdog. >>> While a module is connected, it can be pinged by its link-local address >>> which is derived from its MAC address and thereby known. >>> >>> As you may have noticed I just wrote "SBC" above. >>> That is because I have done this experiment with 3 different SBCs: >>> 1. SolidRun HummingBoard with i.MX6 uSOM Revision 1.5 >>> features Ti WL18MODGB combined WiFi and Bluetooth module >>> - linux-image-4.20.0-trunk-armmp_4.20-1~exp2_armhf.deb >>> (+BT_HCIUART=m, +BT_HCIUART_LL=y, +DYNAMIC_DEBUG=y) >>> ^^ This system was used to produce the attached dmesg.log >>> >>> 2. RaspberryPi 3B >>> 3. RaspberryPi 3B+ >>> - rpi-4.15.y (from their github) >>> - rpi-4.16.y (from their github) >>> - rpi-4.17.y (from their github) >>> - rpi-4.18.y (from their github) >>> - rpi-4.19.y (from their github) >>> - rpi-4.20.y (from their github) >>> bcm2709_defconfig >>> zImage modules dtbs -j12 >>> gcc-linaro-7.3.1-2018.05-x86_64_arm-linux-gnueabihf >>> >>> rpi-4.14.y suffers from a busy kworker (Workqueue: hci0 hci_rx_work >>> [bluetooth]) making tests difficult. >>> >>> Supposedly back in 4.4.8-v7 on raspberrypi this issue with multiple >>> peers did not exist, while the busy kworker did pop up after time >>> requring a reboot. >>> I did not verify or test with that rather old version yet. >>> Would it be a good idea to start from that 4.4.8 rpi fork working up to >>> 4.15 to find the place where it broke? I feel like this kind of work is >>> difficult >>> when forks are involved. >>> >>> Are there any components of the kernel in particular that could be verified >>> in order to figure out what is going wrong? >>> >>> >>> Yours sincerely >>> Josua Mayer >>> > > >