Re: path to landing patch to fix warm boot issue for qca6390

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Wren,

On Mon, May 13, 2024 at 4:13 PM Wren Turkal <wt@xxxxxxxxxxxxxxxx> wrote:
>
> On 5/10/24 11:25 PM, Lk Sii wrote:
> > On 2024/5/11 07:33, Wren Turkal wrote:
> >> On 5/10/24 2:25 PM, Luiz Augusto von Dentz wrote:
> >>> Hi Wren,
> >>>
> >>> On Fri, May 10, 2024 at 4:54 PM Wren Turkal <wt@xxxxxxxxxxxxxxxx> wrote:
> >>>>
> >>>> On 5/10/24 12:48 PM, Luiz Augusto von Dentz wrote:
> >>>>> Hi Wren,
> >>>>>
> >>>>> On Fri, May 10, 2024 at 3:14 PM Wren Turkal <wt@xxxxxxxxxxxxxxxx>
> >>>>> wrote:
> >>>>>>
> >>>>>> On 5/6/24 12:49 PM, Luiz Augusto von Dentz wrote:
> >>>>>>> Hi Wren,
> >>>>>>>
> >>>>>>> On Mon, May 6, 2024 at 3:24 PM Wren Turkal <wt@xxxxxxxxxxxxxxxx>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> Krzysztof,
> >>>>>>>>
> >>>>>>>> I am reaching out to you as you had the most important objections
> >>>>>>>> to the
> >>>>>>>> change to fix qca6390 for the warm boot/module reload bug that I am
> >>>>>>>> experiencing.
> >>>>>>>>
> >>>>>>>> For context, the problem is that the hci_uart module will send
> >>>>>>>> specific
> >>>>>>>> vendor specfic commands during shutdown of the hardware under most
> >>>>>>>> situations. These VSCs put the bluetooth device into a
> >>>>>>>> non-working state
> >>>>>>>> on my Dell XPS 13 9310 with qca6390 bluetooth hardware.
> >>>>>>>>
> >>>>>>>> Zijun's proposed fix is to not send these commands when it's not
> >>>>>>>> appropriate for the hardware. The vendor commands should be
> >>>>>>>> avoided when
> >>>>>>>> the hardware does not have persistent configuration or when the
> >>>>>>>> device
> >>>>>>>> is in setup state (indicating that is has never been setup and
> >>>>>>>> should
> >>>>>>>> not be sent the VSCs on the shutdown path). This is what Zijun's
> >>>>>>>> patch
> >>>>>>>> implements.
> >>>>>>>>
> >>>>>>>> In addition, Zijun's change removes the influence of both
> >>>>>>>> the QCA_BT_OFF qca flag and and HCI_RUNNING hdev flag. Zijun asserts
> >>>>>>>> that those flags should not influence the sending of the VSCs in the
> >>>>>>>> shutdown path. If I understand KK's objections properly, this is
> >>>>>>>> where
> >>>>>>>> his objection is stemming from. KK, is this correct?
> >>>>>>>>
> >>>>>>>> Zijun's proposed fix can be seen here:
> >>>>>>>> https://patchwork.kernel.org/project/bluetooth/patch/1713932807-19619-3-git-send-email-quic_zijuhu@xxxxxxxxxxx/
> >>>>>>>>
> >>>>>>>> I'm wondering if we can resolve this impasse by splitting the change
> >>>>>>>> into two changes, as follows:
> >>>>>>>>
> >>>>>>>> 1. Change that removes the influence of the QCA_BT_OFF and
> >>>>>>>> HCI_RUNNING
> >>>>>>>> flags in the shutdown path.
> >>>>>>>> 2. Add the quirk from Zijun's patch that fixes my hardward
> >>>>>>>> configuration.
> >>>>>>>>
> >>>>>>>> I'm hoping that better clearer descriptions for #1 can help get that
> >>>>>>>> landed since the logic current appears to be at odds with how the
> >>>>>>>> hardware works.
> >>>>>>>>
> >>>>>>>> Also, I am happy to split the patches into the two patches, or
> >>>>>>>> (maybe
> >>>>>>>> more ideally) just modify the commit message to better indicate the
> >>>>>>>> reason the change. I just need guidance from maintainers so that
> >>>>>>>> whatever work I do leads to something acceptable for y'all.
> >>>>>>>>
> >>>>>>>> So, please help me get this done. I am just a user with broken
> >>>>>>>> hardware
> >>>>>>>> and a fondness for Linux. I would love to help do what's needed
> >>>>>>>> to get
> >>>>>>>> this fix landed.
> >>>>>>>
> >>>>>>> Ive also objected to that change, in fact the whole shutdown sequence
> >>>>>>> is sort of bogus in my opinion and the driver shall really have some
> >>>>>>> means to find out what mode it is in when it reboots, regardless if
> >>>>>>> cold or warm boot, since otherwise we are in trouble if the user is
> >>>>>>> booting from another OS that doesn't do the expected shutdown
> >>>>>>> sequence.
> >>>>>>
> >>>>>> This criticism makes a ton of sense. I'm sorry I missed it before.
> >>>>>> There
> >>>>>> were a lot of threads moving in parallel. However, I am curious. Given
> >>>>>> that the patch improves the situation for users (like me). Is there
> >>>>>> any
> >>>>>> way we can separate the redesign of the shutdown sequence and the UX
> >>>>>> improvement that comes with this patch?
> >>>>>>
> >>>>>> Here's my concern. I am happy to do the work to redesign this.
> >>>>>> However,
> >>>>>> I don't think I have the information needed to do this since I don't
> >>>>>> have access to the technical docs for the qca6390. I am worried
> >>>>>> that not
> >>>>>> accepting some form of this patch is letting perfect be the enemy
> >>>>>> of the
> >>>>>> good. And I am not sure how I personally can help with that. If you
> >>>>>> think it's possible for me to do this without the docs for the
> >>>>>> hardware,
> >>>>>> I am willing to give it a shot if I can get some guidance. Honestly, I
> >>>>>> wish I had the skill to be confident about a change like this, but
> >>>>>> I don't.
> >>>>>>
> >>>>>> Any ideas on how to move forward would be greatly appreciated.
> >>>>>>
> >>>>>> And just to be perfectly clear, I have tested this patch on my laptop.
> >>>>>> It greatly enhances my ability to use my hardware since I can
> >>>>>> reboot the
> >>>>>> machine without having to make sure to power cycle the laptop. This is
> >>>>>> not a theoretical improvement.
> >>>>>
> >>>>> I would really love some explanation why can't the driver know under
> >>>>> what mode the controller is when it gets probed, because to me we
> >>>>> cannot accept a driver that only works under certain condition after
> >>>>> the boot and in case it is really impossible, can't even power cycle
> >>>>> it to get it back to cold boot stage???
> >>>>
> >>>> This is a great technical criticism of the driver, and I think you
> >>>> deserve that explanation.
> >>>>
> >>>> However, with the driver already in the kernel, shouldn't the bias be
> >>>> toward mitigating the extremely bad UX and not hold users hostage for
> >>>> the bad design which has already been approved and landed in the kernel?
> >>>>
> >>>>> Also the criticism here should be directed to the vendor, how long
> >>>>> have we been discussing problems in the QCA driver? And the only thing
> >>>>> I see coming our way are work-arounds of the problems, the address not
> >>>>> being unique coming from the firmware itself and when provided via DT
> >>>>> the address is in the wrong byteorder and now that the driver must
> >>>>> communicate the firmware on shutdown in order to get it working
> >>>>> properly on the next boot.
> >>>>
> >>>> I agree that Qualcomm should get flack for this, however, the UX problem
> >>>> can be mitigated with a logic fix that doesn't make the init/shutdown
> >>>> design problem any worse than it currently seems to be. I mean, wouldn't
> >>>> this logic have to exist somewhere even if it weren't the shutdown path?
> >>>>
> >>>> If you are trying to use this as leverage to get Qualcomm to do a bigger
> >>>> thing (redesign the init/shutdown logic), I do think that tactic
> >>>> needlessly puts users in the crossfire. I can totally understand why
> >>>> you'd do it. I am just suffering the crossfire in the meantime, and it
> >>>> doesn't feel great.
> >>>
> >>> So you prefer to risk getting a kernel crash due to UAF over Bluetooth
> >>> not working? Really? Because I haven't seen any configuration that
> >>> those changes you tested don't reintroduce the UAF, which is why I
> >>> haven't applied that change to begin with, so no I'm not holding back
> >>> to pressure Qualcomm to redesign the shutdown logic, it just these
> >>> things got entangled because I just realized the shutdown thingy is
> >>> really out of place, imo, but I'd be fine if there is a temporary fix
> >>> until someone finally decide to spend some time to really fix the
> >>> shutdown logic.
> >>
> >> Luiz, I'm sorry. I do not want a crash instead. I didn't understand that
> >> the solution I proposed (i.e. adding Zijun's logic without removing KK's
> >> logic) would introduce a new crash opportunity. I previously thought you
> >> were saying one of the following things:
> >> 1. The crash opportunity already existed due the init/shudown sequences.
> >> 2. The crash opportunity already existed due the init/shudown sequences
> >> when removing KK's logic.
> >>
> >> If it was #1, I was hoping that adding the logic would make the risk no
> >> worse.
> >>
> >> If it was #2, I was hoping that my suggestion of adding Zijun's logic
> >> without removing KK's logic might represent an acceptable middleground
> >> for a temporary fix that would "correct" the logic without introducing a
> >> new crash opportunity.
> >>
> >> I feel like I may not be clear about what I mean by adding Zijun's logic
> >> and not removing KK's logic. Maybe something like this diff:
> >>
> >> diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
> >> index 2f7ae38d85eb..fcac44ae7898 100644
> >> --- a/drivers/bluetooth/hci_qca.c
> >> +++ b/drivers/bluetooth/hci_qca.c
> >> @@ -2456,6 +2456,10 @@ static void qca_serdev_shutdown(struct device *dev)
> >>                       !test_bit(HCI_RUNNING, &hdev->flags))
> >>                           return;
> >>
> >> +               if (test_bit(HCI_QUIRK_NON_PERSISTENT_SETUP,
> >> &hdev->quirks) ||
> >> +                   hci_dev_test_flag(hdev, HCI_SETUP))
> >> +                       return;
> >> +
> >>                   serdev_device_write_flush(serdev);
> >>                   ret = serdev_device_write_buf(serdev, ibs_wake_cmd,
> >>                                                 sizeof(ibs_wake_cmd));
> >>
> >> I think this diff is mangled due to using Thunderbird, but I hope this
> >> helps convey what I was asking about.
> >>
> >> If I am understanding you correctly now, you are saying that simply
> >> introducing Zijun's logic (without removing KK's logic) will introduce a
> >> new crash opportunity. Is that correct?
> >>
> >
> > as Zijun declared. i believe Zijun's change will solve both this
> > reported regression issue and the use-after-free(crash) issue.
>
> I did see Zijun's claim of that. However, I think that both KK and Luiz
> are not convinced by the explanation. Also, if that explanation does
> convince KK and Luiz, I think that the explanation needs to be added to
> the commit message.
>
> I'm hoping that Luiz will at least respond to the middleground I
> proposed as a workaround.

I recall suggesting using HCI_UART_PROTO_READY instead since that
tells when serdev_device_close has been run:

diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
index 0c9c9ee56592..bbbc86d4932a 100644
--- a/drivers/bluetooth/hci_qca.c
+++ b/drivers/bluetooth/hci_qca.c
@@ -2455,8 +2455,8 @@ static void qca_serdev_shutdown(struct device *dev)
        const u8 edl_reset_soc_cmd[] = { 0x01, 0x00, 0xFC, 0x01, 0x05 };

        if (qcadev->btsoc_type == QCA_QCA6390) {
-               if (test_bit(QCA_BT_OFF, &qca->flags) ||
-                   !test_bit(HCI_RUNNING, &hdev->flags))
+               /* Check if serdev_device_close() has already been called. */
+               if (!test_bit(HCI_UART_PROTO_READY, &hu->flags))
                        return;

                serdev_device_write_flush(serdev);


-- 
Luiz Augusto von Dentz





[Index of Archives]     [Bluez Devel]     [Linux Wireless Networking]     [Linux Wireless Personal Area Networking]     [Linux ATH6KL]     [Linux USB Devel]     [Linux Media Drivers]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Big List of Linux Books]

  Powered by Linux