Re: [PATCH v4 2/4] Bluetooth: Rework le_scan_restart for hci_sync

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 29.08.23 15:27, Stefan Agner wrote:
> 
> No, this hasn't been addressed so far.

Thx and aggh. It's vacation time, so sometimes things take longer, but
that doesn't explain why nothing seems to have happened for 9 weeks now
(at least that how it looks from here, but maybe I missed something).

Luiz, what's up here? What do you need to get down to this?

CCing the other Bluetooth maintainers just to be sure. FWIW, the thread
starts here:
https://lore.kernel.org/linux-bluetooth/578e6d7afd676129decafba846a933f5@xxxxxxxx/#t

Jan saw similar problems:
https://lore.kernel.org/linux-bluetooth/CAPa5EdBSzkuMRoHDJ5w9ESckvNvs68nAfvixyetKcQ5+YD50wA@xxxxxxxxxxxxxx/

> I am also not sure how we can
> help solving that particular issue.

Let's see if this prodding helps to get things rolling. If not, I'll
have to get higher level maintainers involved.

> Besides this, we have other Bluetooth issues which seem to be Kernel
> regressions (where downgrading to Linux 5.15 also helps), folks see
> "hci0: unexpected event for opcode" on Intel but also other systems. We
> haven't bisected that issue yet. But it seems that the Bluetooth stack
> is really somewhat unstable in recent releases.

Might be wise to create a separate thread for those and asking the
bluetooth maintainers if they might have an idea (please CC the
regressions lists as well), maybe we are lucky; if not someone has to
bisect this to get closer to a solution.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

> On 2023-08-29 13:22, Linux regression tracking (Thorsten Leemhuis)
> wrote:
>> Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
>> for once, to make this easily accessible to everyone.
>>
>> Stefan, was this regression ever addressed? Doesn't look like it from
>> here, but maybe I'm missing something.
>>
>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>> --
>> Everything you wanna know about Linux kernel regression tracking:
>> https://linux-regtracking.leemhuis.info/about/#tldr
>> If I did something stupid, please tell me, as explained on that page.
>>
>> #regzbot poke
>>
>> On 30.06.23 12:59, Stefan Agner wrote:
>>> Hi Brian,
>>>
>>> Gentle ping on the issue below.
>>>
>>> On 2023-06-20 16:41, Stefan Agner wrote:
>>>> On 2023-06-16 03:22, Brian Gix wrote:
>>>>
>>>>> On Thu, Jun 15, 2023 at 11:28 AM Luiz Augusto von Dentz <luiz.dentz@xxxxxxxxx> wrote:
>>>>>
>>>>>> +Brian Gix
>>>>>>
>>>>>> On Thu, Jun 15, 2023 at 10:27 AM Luiz Augusto von Dentz
>>>>>> <luiz.dentz@xxxxxxxxx> wrote:
>>>>>>>
>>>>>>> Hi Stefan,
>>>>>>>
>>>>>>> On Thu, Jun 15, 2023 at 5:06 AM Stefan Agner <stefan@xxxxxxxx> wrote:
>>>>>>>>
>>>>>>>> Hi Brian, hi all,
>>>>>>>>
>>>>>>>> We experienced quite some Bluetooth issues after moving from Linux 5.15
>>>>>>>> to 6.1 on Home Assistant OS, especially on Intel NUC type systems (which
>>>>>>>> is a popular choice in our community, so it might just be that). When
>>>>>>>> continuously scanning/listening for BLE packets, the packet flow
>>>>>>>> suddenly ends. Depending on which and how many devices (possibly also
>>>>>>>> other factors) within minutes or hours.
>>>>>>>>
>>>>>>>> Jan (in cc) was able to bisect the issue, and was able to pinpoint the
>>>>>>>> problem to this change.
>>>>>>>>
>>>>>>>> Meanwhile I was able to confirm, that reverting this single commit on
>>>>>>>> the latest 6.1.34 seems to resolve the issue.
>>>>>>>>
>>>>>>>> I've reviewed the change and surrounding code, and one thing I've
>>>>>>>> noticed is that the if statement to set cp.filter_dup in
>>>>>>>> hci_le_set_ext_scan_enable_sync and hci_le_set_scan_enable_sync are
>>>>>>>> different. Not sure if that needs to be the way it is, but my outside
>>>>>>>> gut feeling says hci_le_set_ext_scan_enable_sync should use "if (val &&
>>>>>>>> hci_dev_test_flag(hdev, HCI_MESH))" as well.
>>>>>>>>
>>>>>>>> However, that did not fix the problem (but maybe it is wrong
>>>>>>>> nonetheless?).
>>>>>>>>
>>>>>>>> Anyone has an idea what could be the problem here?
>>>>>>>
>>>>>>> Are there any logs of the problem? Does any HCI command fails or
>>>>>>> anything so that we can track down what could be wrong?
>>>>
>>>> No HCI command fails, there is also no issue reported in the kernel log.
>>>> BlueZ just stops receiving BLE packets, at least from certain devices.
>>>>
>>>>>>
>>>>>> @Brian Gix perhaps you have a better idea what is going wrong here?
>>>>>
>>>>> It seems unlikely that this is Mesh related. Mesh does need for filtering to
>>>>> be FALSE, and Mesh does not use extended scanning in any case.
>>>>>
>>>>> But this was part of the final rewrite to retire the hci_req mechanism in
>>>>> favor of the hci_sync mechanism. So my best guess off the top of my head is
>>>>> that there was an unintended race condition that worked better than the
>>>>> synchronous single-threading mechanism?  Filtering (or not) should not
>>>>
>>>> After review the code I concluded the same. What is a bit surprising to
>>>> me is that it is so well reproducible. I guess it is nicer to have a
>>>> reproducible one than a hard to reproduce one :)
>>>>
>>>>> prevent advertising packets from permanently wedging.  Does anyone have an
>>>>> HCI flow log with and without the offending patch?  Ideally they should be
>>>>> identical...  If they are not then I obviously did something wrong. As this
>>>>> was not specifically Mesh related, I may have missed some non-mesh corner
>>>>> cases.
>>>>
>>>>
>>>> I've taken two btmon captures, I created them using:
>>>> btmon -i hci0 -w /config/hcidump-hci-req-working.log
>>>>
>>>> You can find them at:
>>>> https://os-builds.home-assistant.io/hcidump-hci-req-working.log
>>>> https://os-builds.home-assistant.io/hcidump-hci-sync-non-working.log
>>>
>>> Could you gain any insights from these logs?
>>>
>>> --
>>> Stefan
>>>
>>>
>>>>
>>>> This is while running our user space software (Home Assistant with
>>>> Bluetooth integration). Besides some BLE devices (e.g. Xioami Mi
>>>> Temperature & Humidity sensor) I have a ESP32 running which sends SPAM
>>>> advertisements every 100ms (this accelerates the issue). In the
>>>> non-working case you'll see that the system doesn't receive any SPAM
>>>> advertisements after around 27 seconds. The working log shows that it
>>>> continuously receives the same packets (capture 120s).
>>>>
>>>> Hope this helps.
>>>>
>>>> --
>>>> Stefan
>>>>
>>>>
>>>
>>>
> 
> 



[Index of Archives]     [Bluez Devel]     [Linux Wireless Networking]     [Linux Wireless Personal Area Networking]     [Linux ATH6KL]     [Linux USB Devel]     [Linux Media Drivers]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Big List of Linux Books]

  Powered by Linux