On 29.08.23 15:27, Stefan Agner wrote: > > No, this hasn't been addressed so far. Thx and aggh. It's vacation time, so sometimes things take longer, but that doesn't explain why nothing seems to have happened for 9 weeks now (at least that how it looks from here, but maybe I missed something). Luiz, what's up here? What do you need to get down to this? CCing the other Bluetooth maintainers just to be sure. FWIW, the thread starts here: https://lore.kernel.org/linux-bluetooth/578e6d7afd676129decafba846a933f5@xxxxxxxx/#t Jan saw similar problems: https://lore.kernel.org/linux-bluetooth/CAPa5EdBSzkuMRoHDJ5w9ESckvNvs68nAfvixyetKcQ5+YD50wA@xxxxxxxxxxxxxx/ > I am also not sure how we can > help solving that particular issue. Let's see if this prodding helps to get things rolling. If not, I'll have to get higher level maintainers involved. > Besides this, we have other Bluetooth issues which seem to be Kernel > regressions (where downgrading to Linux 5.15 also helps), folks see > "hci0: unexpected event for opcode" on Intel but also other systems. We > haven't bisected that issue yet. But it seems that the Bluetooth stack > is really somewhat unstable in recent releases. Might be wise to create a separate thread for those and asking the bluetooth maintainers if they might have an idea (please CC the regressions lists as well), maybe we are lucky; if not someone has to bisect this to get closer to a solution. Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page. > On 2023-08-29 13:22, Linux regression tracking (Thorsten Leemhuis) > wrote: >> Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting >> for once, to make this easily accessible to everyone. >> >> Stefan, was this regression ever addressed? Doesn't look like it from >> here, but maybe I'm missing something. >> >> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) >> -- >> Everything you wanna know about Linux kernel regression tracking: >> https://linux-regtracking.leemhuis.info/about/#tldr >> If I did something stupid, please tell me, as explained on that page. >> >> #regzbot poke >> >> On 30.06.23 12:59, Stefan Agner wrote: >>> Hi Brian, >>> >>> Gentle ping on the issue below. >>> >>> On 2023-06-20 16:41, Stefan Agner wrote: >>>> On 2023-06-16 03:22, Brian Gix wrote: >>>> >>>>> On Thu, Jun 15, 2023 at 11:28 AM Luiz Augusto von Dentz <luiz.dentz@xxxxxxxxx> wrote: >>>>> >>>>>> +Brian Gix >>>>>> >>>>>> On Thu, Jun 15, 2023 at 10:27 AM Luiz Augusto von Dentz >>>>>> <luiz.dentz@xxxxxxxxx> wrote: >>>>>>> >>>>>>> Hi Stefan, >>>>>>> >>>>>>> On Thu, Jun 15, 2023 at 5:06 AM Stefan Agner <stefan@xxxxxxxx> wrote: >>>>>>>> >>>>>>>> Hi Brian, hi all, >>>>>>>> >>>>>>>> We experienced quite some Bluetooth issues after moving from Linux 5.15 >>>>>>>> to 6.1 on Home Assistant OS, especially on Intel NUC type systems (which >>>>>>>> is a popular choice in our community, so it might just be that). When >>>>>>>> continuously scanning/listening for BLE packets, the packet flow >>>>>>>> suddenly ends. Depending on which and how many devices (possibly also >>>>>>>> other factors) within minutes or hours. >>>>>>>> >>>>>>>> Jan (in cc) was able to bisect the issue, and was able to pinpoint the >>>>>>>> problem to this change. >>>>>>>> >>>>>>>> Meanwhile I was able to confirm, that reverting this single commit on >>>>>>>> the latest 6.1.34 seems to resolve the issue. >>>>>>>> >>>>>>>> I've reviewed the change and surrounding code, and one thing I've >>>>>>>> noticed is that the if statement to set cp.filter_dup in >>>>>>>> hci_le_set_ext_scan_enable_sync and hci_le_set_scan_enable_sync are >>>>>>>> different. Not sure if that needs to be the way it is, but my outside >>>>>>>> gut feeling says hci_le_set_ext_scan_enable_sync should use "if (val && >>>>>>>> hci_dev_test_flag(hdev, HCI_MESH))" as well. >>>>>>>> >>>>>>>> However, that did not fix the problem (but maybe it is wrong >>>>>>>> nonetheless?). >>>>>>>> >>>>>>>> Anyone has an idea what could be the problem here? >>>>>>> >>>>>>> Are there any logs of the problem? Does any HCI command fails or >>>>>>> anything so that we can track down what could be wrong? >>>> >>>> No HCI command fails, there is also no issue reported in the kernel log. >>>> BlueZ just stops receiving BLE packets, at least from certain devices. >>>> >>>>>> >>>>>> @Brian Gix perhaps you have a better idea what is going wrong here? >>>>> >>>>> It seems unlikely that this is Mesh related. Mesh does need for filtering to >>>>> be FALSE, and Mesh does not use extended scanning in any case. >>>>> >>>>> But this was part of the final rewrite to retire the hci_req mechanism in >>>>> favor of the hci_sync mechanism. So my best guess off the top of my head is >>>>> that there was an unintended race condition that worked better than the >>>>> synchronous single-threading mechanism? Filtering (or not) should not >>>> >>>> After review the code I concluded the same. What is a bit surprising to >>>> me is that it is so well reproducible. I guess it is nicer to have a >>>> reproducible one than a hard to reproduce one :) >>>> >>>>> prevent advertising packets from permanently wedging. Does anyone have an >>>>> HCI flow log with and without the offending patch? Ideally they should be >>>>> identical... If they are not then I obviously did something wrong. As this >>>>> was not specifically Mesh related, I may have missed some non-mesh corner >>>>> cases. >>>> >>>> >>>> I've taken two btmon captures, I created them using: >>>> btmon -i hci0 -w /config/hcidump-hci-req-working.log >>>> >>>> You can find them at: >>>> https://os-builds.home-assistant.io/hcidump-hci-req-working.log >>>> https://os-builds.home-assistant.io/hcidump-hci-sync-non-working.log >>> >>> Could you gain any insights from these logs? >>> >>> -- >>> Stefan >>> >>> >>>> >>>> This is while running our user space software (Home Assistant with >>>> Bluetooth integration). Besides some BLE devices (e.g. Xioami Mi >>>> Temperature & Humidity sensor) I have a ESP32 running which sends SPAM >>>> advertisements every 100ms (this accelerates the issue). In the >>>> non-working case you'll see that the system doesn't receive any SPAM >>>> advertisements after around 27 seconds. The working log shows that it >>>> continuously receives the same packets (capture 120s). >>>> >>>> Hope this helps. >>>> >>>> -- >>>> Stefan >>>> >>>> >>> >>> > >