Hi Herman, On Fri, Jun 2, 2023 at 2:51 AM Herman Meerlo <herman.meerlo@xxxxxxxx> wrote: > > I'm working on a Raspberry Pi device that performs periodic BLE discovery and tries to connect to multiple devices. Most of the times this goes fine but especially in BLE heavy environments every now and then it fails because it says that the device is not found. I have been debugging this issue for a few weeks now and it comes down to this: > > • Connect to DBus to start the discovery > • Devices are found, interfacesAdded callbacks are made, everything looks fine > • Stop discovery > • Directly some devices are removed by the interfacesRemoved callback > • My code does not get the device it searches for -> unhappy user > > What happens under the hood is that DBus removes all devices from it's cache that have been indicated by Bluez as being Not connectable. I.e. there is no use in keeping them around, you can't connect to them anyway. But... this is not true for the device I'm looking for. It is marked as Not connectable incorrectly. > So to chase down the problem I create a btmon dump which shows indeed that the device is reported to be Not connectable after having received a SCAN_RSP: > > > HCI Event: LE Meta Event (0x3e) plen 38 #73 [hci0] 34.359921 > LE Advertising Report (0x02) > Num reports: 1 > Event type: Scan response - SCAN_RSP (0x04) > Address type: Public (0x00) > Address: F4:B8:5E:64:02:55 (Texas Instruments) > Data length: 26 > Name (complete): BeeWi SmartLite > Peripheral Conn. Interval: 0x0028 - 0x0050 > TX power: 0 dBm > RSSI: -42 dBm (0xd6) > @ MGMT Event: Device Found (0x0012) plen 40 {0x0001} [hci0] 34.360057 > LE Address: F4:B8:5E:64:02:55 (Texas Instruments) > RSSI: -42 dBm (0xd6) > Flags: 0x00000004 > Not Connectable > Data length: 26 > Name (complete): BeeWi SmartLite > Peripheral Conn. Interval: 0x0028 - 0x0050 > TX power: 0 dBm > > But all ADV_IND PDU's before that clearly indicate that the device is connectable, it is only after this SCAN_RSP that it is reported as Not connectable: > > > HCI Event: LE Meta Event (0x3e) plen 27 #46 [hci0] 34.152817 > LE Advertising Report (0x02) > Num reports: 1 > Event type: Connectable undirected - ADV_IND (0x00) > Address type: Public (0x00) > Address: F4:B8:5E:64:02:55 (Texas Instruments) > Data length: 15 > Flags: 0x06 > LE General Discoverable Mode > BR/EDR Not Supported > Company: Texas Instruments Inc. (13) > Data: 06030108b0e408f7 > RSSI: -43 dBm (0xd5) > @ MGMT Event: Device Found (0x0012) plen 31 {0x0001} [hci0] 34.152905 > LE Address: 44:6E:FF:00:0D:65 (Resolvable) > RSSI: -74 dBm (0xb6) > Flags: 0x00000000 > Data length: 17 > Flags: 0x1a > LE General Discoverable Mode > Simultaneous LE and BR/EDR (Controller) > Simultaneous LE and BR/EDR (Host) > TX power: 9 dBm > Company: Apple, Inc. (76) > Type: Unknown (16) > Data: 01188898dc > > HCI Event: LE Meta Event (0x3e) plen 41 #47 [hci0] 34.156958 > LE Advertising Report (0x02) > Num reports: 1 > Event type: Connectable undirected - ADV_IND (0x00) > Address type: Random (0x01) > Address: FA:BD:8D:12:26:BF (Static) > Data length: 29 > Name (short): P mesh > Flags: 0x04 > BR/EDR Not Supported > 128-bit Service UUIDs (partial): 1 entry > Vendor specific > RSSI: -47 dBm (0xd1) > @ MGMT Event: Device Found (0x0012) plen 29 {0x0001} [hci0] 34.157030 > LE Address: F4:B8:5E:64:02:55 (Texas Instruments) > RSSI: -43 dBm (0xd5) > Flags: 0x00000000 > Data length: 15 > Flags: 0x06 > LE General Discoverable Mode > BR/EDR Not Supported > Company: Texas Instruments Inc. (13) > Data: 06030108b0e408f7 > > So I am heavily doubting whether the Linux kernel code has a bug in handling the SCAN_RSP. Please look at this part of the kernel: > > https://github.com/torvalds/linux/blob/48b1320a674e1ff5de2fad8606bee38f724594dc/net/bluetooth/hci_event.c#L6326 > > It sets the NOT CONNECTABLE flag by default for a SCAN_RSP and will overwrite it with any flags it has received with a previous ADV_IND. But it does not seem to take into account that in BLE heavy environments the previous ADV_IND might be of a totally different device. So every now and then it will enter the first path where it will just report the device with a NOT_CONNECTABLE flag. Or am I missing something here? You are saying that the controller would interleave ADV_IND of different peers before SCAN_RSP, in that case yes that would be possible that the device would be marked as NOT_CONNECTABLE, usually that is not really the case which is why we end up adding the code above: * In the really unlikely case that a controller get confused * and just sends a scan response event, then it is marked as * not connectable as well. I guess it is not really unlikely after all, so perhaps we need a flag indicating this is a standalone SCAN_RSP e.g. MGMT_DEV_FOUND_SCAN_RSP, then we can treat it accordingly in userspace. -- Luiz Augusto von Dentz