Hi Luiz, On 2023-08-30 19:28, Luiz Augusto von Dentz wrote: > Hi Stefan, > > On Tue, Aug 29, 2023 at 1:42 PM Luiz Augusto von Dentz > <luiz.dentz@xxxxxxxxx> wrote: >> >> Hi Stefan, Brian, >> >> On Tue, Aug 29, 2023 at 6:27 AM Stefan Agner <stefan@xxxxxxxx> wrote: >> > >> > Hi Thorsten, >> > >> > No, this hasn't been addressed so far. I am also not sure how we can >> > help solving that particular issue. >> > >> > Besides this, we have other Bluetooth issues which seem to be Kernel >> > regressions (where downgrading to Linux 5.15 also helps), folks see >> > "hci0: unexpected event for opcode" on Intel but also other systems. We >> > haven't bisected that issue yet. But it seems that the Bluetooth stack >> > is really somewhat unstable in recent releases. >> >> >> I suspect the following change shall make it behave as before, the use >> of hci_cmd_sync_queue is not equivalent to hci_req_sync: >> >> https://gist.github.com/Vudentz/b78f34e3775c8cd2db55b868e5c8ef42 >> >> That said, I'm considering removing the whole custom handling for >> HCI_QUIRK_STRICT_DUPLICATE_FILTER and just disable duplicate filtering >> when this flag is set. > > Any chance to tests the following changes: > > https://patchwork.kernel.org/project/bluetooth/patch/20230829205936.766544-1-luiz.dentz@xxxxxxxxx/ I've tested this with my SPAM test device, and I can confirm that this indeed fixes the problem we are seeing: The BLE advertisements continue to come in just fine with the patch applied! Thanks for the fix! -- Stefan > >> > -- >> > Stefan >> > >> > >> > On 2023-08-29 13:22, Linux regression tracking (Thorsten Leemhuis) >> > wrote: >> > > Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting >> > > for once, to make this easily accessible to everyone. >> > > >> > > Stefan, was this regression ever addressed? Doesn't look like it from >> > > here, but maybe I'm missing something. >> > > >> > > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) >> > > -- >> > > Everything you wanna know about Linux kernel regression tracking: >> > > https://linux-regtracking.leemhuis.info/about/#tldr >> > > If I did something stupid, please tell me, as explained on that page. >> > > >> > > #regzbot poke >> > > >> > > On 30.06.23 12:59, Stefan Agner wrote: >> > >> Hi Brian, >> > >> >> > >> Gentle ping on the issue below. >> > >> >> > >> On 2023-06-20 16:41, Stefan Agner wrote: >> > >>> On 2023-06-16 03:22, Brian Gix wrote: >> > >>> >> > >>>> On Thu, Jun 15, 2023 at 11:28 AM Luiz Augusto von Dentz <luiz.dentz@xxxxxxxxx> wrote: >> > >>>> >> > >>>>> +Brian Gix >> > >>>>> >> > >>>>> On Thu, Jun 15, 2023 at 10:27 AM Luiz Augusto von Dentz >> > >>>>> <luiz.dentz@xxxxxxxxx> wrote: >> > >>>>>> >> > >>>>>> Hi Stefan, >> > >>>>>> >> > >>>>>> On Thu, Jun 15, 2023 at 5:06 AM Stefan Agner <stefan@xxxxxxxx> wrote: >> > >>>>>>> >> > >>>>>>> Hi Brian, hi all, >> > >>>>>>> >> > >>>>>>> We experienced quite some Bluetooth issues after moving from Linux 5.15 >> > >>>>>>> to 6.1 on Home Assistant OS, especially on Intel NUC type systems (which >> > >>>>>>> is a popular choice in our community, so it might just be that). When >> > >>>>>>> continuously scanning/listening for BLE packets, the packet flow >> > >>>>>>> suddenly ends. Depending on which and how many devices (possibly also >> > >>>>>>> other factors) within minutes or hours. >> > >>>>>>> >> > >>>>>>> Jan (in cc) was able to bisect the issue, and was able to pinpoint the >> > >>>>>>> problem to this change. >> > >>>>>>> >> > >>>>>>> Meanwhile I was able to confirm, that reverting this single commit on >> > >>>>>>> the latest 6.1.34 seems to resolve the issue. >> > >>>>>>> >> > >>>>>>> I've reviewed the change and surrounding code, and one thing I've >> > >>>>>>> noticed is that the if statement to set cp.filter_dup in >> > >>>>>>> hci_le_set_ext_scan_enable_sync and hci_le_set_scan_enable_sync are >> > >>>>>>> different. Not sure if that needs to be the way it is, but my outside >> > >>>>>>> gut feeling says hci_le_set_ext_scan_enable_sync should use "if (val && >> > >>>>>>> hci_dev_test_flag(hdev, HCI_MESH))" as well. >> > >>>>>>> >> > >>>>>>> However, that did not fix the problem (but maybe it is wrong >> > >>>>>>> nonetheless?). >> > >>>>>>> >> > >>>>>>> Anyone has an idea what could be the problem here? >> > >>>>>> >> > >>>>>> Are there any logs of the problem? Does any HCI command fails or >> > >>>>>> anything so that we can track down what could be wrong? >> > >>> >> > >>> No HCI command fails, there is also no issue reported in the kernel log. >> > >>> BlueZ just stops receiving BLE packets, at least from certain devices. >> > >>> >> > >>>>> >> > >>>>> @Brian Gix perhaps you have a better idea what is going wrong here? >> > >>>> >> > >>>> It seems unlikely that this is Mesh related. Mesh does need for filtering to >> > >>>> be FALSE, and Mesh does not use extended scanning in any case. >> > >>>> >> > >>>> But this was part of the final rewrite to retire the hci_req mechanism in >> > >>>> favor of the hci_sync mechanism. So my best guess off the top of my head is >> > >>>> that there was an unintended race condition that worked better than the >> > >>>> synchronous single-threading mechanism? Filtering (or not) should not >> > >>> >> > >>> After review the code I concluded the same. What is a bit surprising to >> > >>> me is that it is so well reproducible. I guess it is nicer to have a >> > >>> reproducible one than a hard to reproduce one :) >> > >>> >> > >>>> prevent advertising packets from permanently wedging. Does anyone have an >> > >>>> HCI flow log with and without the offending patch? Ideally they should be >> > >>>> identical... If they are not then I obviously did something wrong. As this >> > >>>> was not specifically Mesh related, I may have missed some non-mesh corner >> > >>>> cases. >> > >>> >> > >>> >> > >>> I've taken two btmon captures, I created them using: >> > >>> btmon -i hci0 -w /config/hcidump-hci-req-working.log >> > >>> >> > >>> You can find them at: >> > >>> https://os-builds.home-assistant.io/hcidump-hci-req-working.log >> > >>> https://os-builds.home-assistant.io/hcidump-hci-sync-non-working.log >> > >> >> > >> Could you gain any insights from these logs? >> > >> >> > >> -- >> > >> Stefan >> > >> >> > >> >> > >>> >> > >>> This is while running our user space software (Home Assistant with >> > >>> Bluetooth integration). Besides some BLE devices (e.g. Xioami Mi >> > >>> Temperature & Humidity sensor) I have a ESP32 running which sends SPAM >> > >>> advertisements every 100ms (this accelerates the issue). In the >> > >>> non-working case you'll see that the system doesn't receive any SPAM >> > >>> advertisements after around 27 seconds. The working log shows that it >> > >>> continuously receives the same packets (capture 120s). >> > >>> >> > >>> Hope this helps. >> > >>> >> > >>> -- >> > >>> Stefan >> > >>> >> > >>> >> > >> >> > >> >> >> >> >> -- >> Luiz Augusto von Dentz