Re: [PATCH v4 2/4] Bluetooth: Rework le_scan_restart for hci_sync

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Luiz,

On 2023-08-30 19:28, Luiz Augusto von Dentz wrote:
> Hi Stefan,
> 
> On Tue, Aug 29, 2023 at 1:42 PM Luiz Augusto von Dentz
> <luiz.dentz@xxxxxxxxx> wrote:
>>
>> Hi Stefan, Brian,
>>
>> On Tue, Aug 29, 2023 at 6:27 AM Stefan Agner <stefan@xxxxxxxx> wrote:
>> >
>> > Hi Thorsten,
>> >
>> > No, this hasn't been addressed so far. I am also not sure how we can
>> > help solving that particular issue.
>> >
>> > Besides this, we have other Bluetooth issues which seem to be Kernel
>> > regressions (where downgrading to Linux 5.15 also helps), folks see
>> > "hci0: unexpected event for opcode" on Intel but also other systems. We
>> > haven't bisected that issue yet. But it seems that the Bluetooth stack
>> > is really somewhat unstable in recent releases.
>>
>>
>> I suspect the following change shall make it behave as before, the use
>> of hci_cmd_sync_queue is not equivalent to hci_req_sync:
>>
>> https://gist.github.com/Vudentz/b78f34e3775c8cd2db55b868e5c8ef42
>>
>> That said, I'm considering removing the whole custom handling for
>> HCI_QUIRK_STRICT_DUPLICATE_FILTER and just disable duplicate filtering
>> when this flag is set.
> 
> Any chance to tests the following changes:
> 
> https://patchwork.kernel.org/project/bluetooth/patch/20230829205936.766544-1-luiz.dentz@xxxxxxxxx/

I've tested this with my SPAM test device, and I can confirm that this
indeed fixes the problem we are seeing: The BLE advertisements continue
to come in just fine with the patch applied!

Thanks for the fix!

--
Stefan

> 
>> > --
>> > Stefan
>> >
>> >
>> > On 2023-08-29 13:22, Linux regression tracking (Thorsten Leemhuis)
>> > wrote:
>> > > Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
>> > > for once, to make this easily accessible to everyone.
>> > >
>> > > Stefan, was this regression ever addressed? Doesn't look like it from
>> > > here, but maybe I'm missing something.
>> > >
>> > > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>> > > --
>> > > Everything you wanna know about Linux kernel regression tracking:
>> > > https://linux-regtracking.leemhuis.info/about/#tldr
>> > > If I did something stupid, please tell me, as explained on that page.
>> > >
>> > > #regzbot poke
>> > >
>> > > On 30.06.23 12:59, Stefan Agner wrote:
>> > >> Hi Brian,
>> > >>
>> > >> Gentle ping on the issue below.
>> > >>
>> > >> On 2023-06-20 16:41, Stefan Agner wrote:
>> > >>> On 2023-06-16 03:22, Brian Gix wrote:
>> > >>>
>> > >>>> On Thu, Jun 15, 2023 at 11:28 AM Luiz Augusto von Dentz <luiz.dentz@xxxxxxxxx> wrote:
>> > >>>>
>> > >>>>> +Brian Gix
>> > >>>>>
>> > >>>>> On Thu, Jun 15, 2023 at 10:27 AM Luiz Augusto von Dentz
>> > >>>>> <luiz.dentz@xxxxxxxxx> wrote:
>> > >>>>>>
>> > >>>>>> Hi Stefan,
>> > >>>>>>
>> > >>>>>> On Thu, Jun 15, 2023 at 5:06 AM Stefan Agner <stefan@xxxxxxxx> wrote:
>> > >>>>>>>
>> > >>>>>>> Hi Brian, hi all,
>> > >>>>>>>
>> > >>>>>>> We experienced quite some Bluetooth issues after moving from Linux 5.15
>> > >>>>>>> to 6.1 on Home Assistant OS, especially on Intel NUC type systems (which
>> > >>>>>>> is a popular choice in our community, so it might just be that). When
>> > >>>>>>> continuously scanning/listening for BLE packets, the packet flow
>> > >>>>>>> suddenly ends. Depending on which and how many devices (possibly also
>> > >>>>>>> other factors) within minutes or hours.
>> > >>>>>>>
>> > >>>>>>> Jan (in cc) was able to bisect the issue, and was able to pinpoint the
>> > >>>>>>> problem to this change.
>> > >>>>>>>
>> > >>>>>>> Meanwhile I was able to confirm, that reverting this single commit on
>> > >>>>>>> the latest 6.1.34 seems to resolve the issue.
>> > >>>>>>>
>> > >>>>>>> I've reviewed the change and surrounding code, and one thing I've
>> > >>>>>>> noticed is that the if statement to set cp.filter_dup in
>> > >>>>>>> hci_le_set_ext_scan_enable_sync and hci_le_set_scan_enable_sync are
>> > >>>>>>> different. Not sure if that needs to be the way it is, but my outside
>> > >>>>>>> gut feeling says hci_le_set_ext_scan_enable_sync should use "if (val &&
>> > >>>>>>> hci_dev_test_flag(hdev, HCI_MESH))" as well.
>> > >>>>>>>
>> > >>>>>>> However, that did not fix the problem (but maybe it is wrong
>> > >>>>>>> nonetheless?).
>> > >>>>>>>
>> > >>>>>>> Anyone has an idea what could be the problem here?
>> > >>>>>>
>> > >>>>>> Are there any logs of the problem? Does any HCI command fails or
>> > >>>>>> anything so that we can track down what could be wrong?
>> > >>>
>> > >>> No HCI command fails, there is also no issue reported in the kernel log.
>> > >>> BlueZ just stops receiving BLE packets, at least from certain devices.
>> > >>>
>> > >>>>>
>> > >>>>> @Brian Gix perhaps you have a better idea what is going wrong here?
>> > >>>>
>> > >>>> It seems unlikely that this is Mesh related. Mesh does need for filtering to
>> > >>>> be FALSE, and Mesh does not use extended scanning in any case.
>> > >>>>
>> > >>>> But this was part of the final rewrite to retire the hci_req mechanism in
>> > >>>> favor of the hci_sync mechanism. So my best guess off the top of my head is
>> > >>>> that there was an unintended race condition that worked better than the
>> > >>>> synchronous single-threading mechanism?  Filtering (or not) should not
>> > >>>
>> > >>> After review the code I concluded the same. What is a bit surprising to
>> > >>> me is that it is so well reproducible. I guess it is nicer to have a
>> > >>> reproducible one than a hard to reproduce one :)
>> > >>>
>> > >>>> prevent advertising packets from permanently wedging.  Does anyone have an
>> > >>>> HCI flow log with and without the offending patch?  Ideally they should be
>> > >>>> identical...  If they are not then I obviously did something wrong. As this
>> > >>>> was not specifically Mesh related, I may have missed some non-mesh corner
>> > >>>> cases.
>> > >>>
>> > >>>
>> > >>> I've taken two btmon captures, I created them using:
>> > >>> btmon -i hci0 -w /config/hcidump-hci-req-working.log
>> > >>>
>> > >>> You can find them at:
>> > >>> https://os-builds.home-assistant.io/hcidump-hci-req-working.log
>> > >>> https://os-builds.home-assistant.io/hcidump-hci-sync-non-working.log
>> > >>
>> > >> Could you gain any insights from these logs?
>> > >>
>> > >> --
>> > >> Stefan
>> > >>
>> > >>
>> > >>>
>> > >>> This is while running our user space software (Home Assistant with
>> > >>> Bluetooth integration). Besides some BLE devices (e.g. Xioami Mi
>> > >>> Temperature & Humidity sensor) I have a ESP32 running which sends SPAM
>> > >>> advertisements every 100ms (this accelerates the issue). In the
>> > >>> non-working case you'll see that the system doesn't receive any SPAM
>> > >>> advertisements after around 27 seconds. The working log shows that it
>> > >>> continuously receives the same packets (capture 120s).
>> > >>>
>> > >>> Hope this helps.
>> > >>>
>> > >>> --
>> > >>> Stefan
>> > >>>
>> > >>>
>> > >>
>> > >>
>>
>>
>>
>> --
>> Luiz Augusto von Dentz



[Index of Archives]     [Bluez Devel]     [Linux Wireless Networking]     [Linux Wireless Personal Area Networking]     [Linux ATH6KL]     [Linux USB Devel]     [Linux Media Drivers]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Big List of Linux Books]

  Powered by Linux