Hi Brian, Gentle ping on the issue below. On 2023-06-20 16:41, Stefan Agner wrote: > On 2023-06-16 03:22, Brian Gix wrote: > >> On Thu, Jun 15, 2023 at 11:28 AM Luiz Augusto von Dentz <luiz.dentz@xxxxxxxxx> wrote: >> >>> +Brian Gix >>> >>> On Thu, Jun 15, 2023 at 10:27 AM Luiz Augusto von Dentz >>> <luiz.dentz@xxxxxxxxx> wrote: >>>> >>>> Hi Stefan, >>>> >>>> On Thu, Jun 15, 2023 at 5:06 AM Stefan Agner <stefan@xxxxxxxx> wrote: >>>>> >>>>> Hi Brian, hi all, >>>>> >>>>> We experienced quite some Bluetooth issues after moving from Linux 5.15 >>>>> to 6.1 on Home Assistant OS, especially on Intel NUC type systems (which >>>>> is a popular choice in our community, so it might just be that). When >>>>> continuously scanning/listening for BLE packets, the packet flow >>>>> suddenly ends. Depending on which and how many devices (possibly also >>>>> other factors) within minutes or hours. >>>>> >>>>> Jan (in cc) was able to bisect the issue, and was able to pinpoint the >>>>> problem to this change. >>>>> >>>>> Meanwhile I was able to confirm, that reverting this single commit on >>>>> the latest 6.1.34 seems to resolve the issue. >>>>> >>>>> I've reviewed the change and surrounding code, and one thing I've >>>>> noticed is that the if statement to set cp.filter_dup in >>>>> hci_le_set_ext_scan_enable_sync and hci_le_set_scan_enable_sync are >>>>> different. Not sure if that needs to be the way it is, but my outside >>>>> gut feeling says hci_le_set_ext_scan_enable_sync should use "if (val && >>>>> hci_dev_test_flag(hdev, HCI_MESH))" as well. >>>>> >>>>> However, that did not fix the problem (but maybe it is wrong >>>>> nonetheless?). >>>>> >>>>> Anyone has an idea what could be the problem here? >>>> >>>> Are there any logs of the problem? Does any HCI command fails or >>>> anything so that we can track down what could be wrong? > > No HCI command fails, there is also no issue reported in the kernel log. > BlueZ just stops receiving BLE packets, at least from certain devices. > >>> >>> @Brian Gix perhaps you have a better idea what is going wrong here? >> >> It seems unlikely that this is Mesh related. Mesh does need for filtering to >> be FALSE, and Mesh does not use extended scanning in any case. >> >> But this was part of the final rewrite to retire the hci_req mechanism in >> favor of the hci_sync mechanism. So my best guess off the top of my head is >> that there was an unintended race condition that worked better than the >> synchronous single-threading mechanism? Filtering (or not) should not > > After review the code I concluded the same. What is a bit surprising to > me is that it is so well reproducible. I guess it is nicer to have a > reproducible one than a hard to reproduce one :) > >> prevent advertising packets from permanently wedging. Does anyone have an >> HCI flow log with and without the offending patch? Ideally they should be >> identical... If they are not then I obviously did something wrong. As this >> was not specifically Mesh related, I may have missed some non-mesh corner >> cases. > > > I've taken two btmon captures, I created them using: > btmon -i hci0 -w /config/hcidump-hci-req-working.log > > You can find them at: > https://os-builds.home-assistant.io/hcidump-hci-req-working.log > https://os-builds.home-assistant.io/hcidump-hci-sync-non-working.log Could you gain any insights from these logs? -- Stefan > > This is while running our user space software (Home Assistant with > Bluetooth integration). Besides some BLE devices (e.g. Xioami Mi > Temperature & Humidity sensor) I have a ESP32 running which sends SPAM > advertisements every 100ms (this accelerates the issue). In the > non-working case you'll see that the system doesn't receive any SPAM > advertisements after around 27 seconds. The working log shows that it > continuously receives the same packets (capture 120s). > > Hope this helps. > > -- > Stefan > >