Re: path to landing patch to fix warm boot issue for qca6390

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2024/5/11 07:33, Wren Turkal wrote:
> On 5/10/24 2:25 PM, Luiz Augusto von Dentz wrote:
>> Hi Wren,
>>
>> On Fri, May 10, 2024 at 4:54 PM Wren Turkal <wt@xxxxxxxxxxxxxxxx> wrote:
>>>
>>> On 5/10/24 12:48 PM, Luiz Augusto von Dentz wrote:
>>>> Hi Wren,
>>>>
>>>> On Fri, May 10, 2024 at 3:14 PM Wren Turkal <wt@xxxxxxxxxxxxxxxx>
>>>> wrote:
>>>>>
>>>>> On 5/6/24 12:49 PM, Luiz Augusto von Dentz wrote:
>>>>>> Hi Wren,
>>>>>>
>>>>>> On Mon, May 6, 2024 at 3:24 PM Wren Turkal <wt@xxxxxxxxxxxxxxxx>
>>>>>> wrote:
>>>>>>>
>>>>>>> Krzysztof,
>>>>>>>
>>>>>>> I am reaching out to you as you had the most important objections
>>>>>>> to the
>>>>>>> change to fix qca6390 for the warm boot/module reload bug that I am
>>>>>>> experiencing.
>>>>>>>
>>>>>>> For context, the problem is that the hci_uart module will send
>>>>>>> specific
>>>>>>> vendor specfic commands during shutdown of the hardware under most
>>>>>>> situations. These VSCs put the bluetooth device into a
>>>>>>> non-working state
>>>>>>> on my Dell XPS 13 9310 with qca6390 bluetooth hardware.
>>>>>>>
>>>>>>> Zijun's proposed fix is to not send these commands when it's not
>>>>>>> appropriate for the hardware. The vendor commands should be
>>>>>>> avoided when
>>>>>>> the hardware does not have persistent configuration or when the
>>>>>>> device
>>>>>>> is in setup state (indicating that is has never been setup and
>>>>>>> should
>>>>>>> not be sent the VSCs on the shutdown path). This is what Zijun's
>>>>>>> patch
>>>>>>> implements.
>>>>>>>
>>>>>>> In addition, Zijun's change removes the influence of both
>>>>>>> the QCA_BT_OFF qca flag and and HCI_RUNNING hdev flag. Zijun asserts
>>>>>>> that those flags should not influence the sending of the VSCs in the
>>>>>>> shutdown path. If I understand KK's objections properly, this is
>>>>>>> where
>>>>>>> his objection is stemming from. KK, is this correct?
>>>>>>>
>>>>>>> Zijun's proposed fix can be seen here:
>>>>>>> https://patchwork.kernel.org/project/bluetooth/patch/1713932807-19619-3-git-send-email-quic_zijuhu@xxxxxxxxxxx/
>>>>>>>
>>>>>>> I'm wondering if we can resolve this impasse by splitting the change
>>>>>>> into two changes, as follows:
>>>>>>>
>>>>>>> 1. Change that removes the influence of the QCA_BT_OFF and
>>>>>>> HCI_RUNNING
>>>>>>> flags in the shutdown path.
>>>>>>> 2. Add the quirk from Zijun's patch that fixes my hardward
>>>>>>> configuration.
>>>>>>>
>>>>>>> I'm hoping that better clearer descriptions for #1 can help get that
>>>>>>> landed since the logic current appears to be at odds with how the
>>>>>>> hardware works.
>>>>>>>
>>>>>>> Also, I am happy to split the patches into the two patches, or
>>>>>>> (maybe
>>>>>>> more ideally) just modify the commit message to better indicate the
>>>>>>> reason the change. I just need guidance from maintainers so that
>>>>>>> whatever work I do leads to something acceptable for y'all.
>>>>>>>
>>>>>>> So, please help me get this done. I am just a user with broken
>>>>>>> hardware
>>>>>>> and a fondness for Linux. I would love to help do what's needed
>>>>>>> to get
>>>>>>> this fix landed.
>>>>>>
>>>>>> Ive also objected to that change, in fact the whole shutdown sequence
>>>>>> is sort of bogus in my opinion and the driver shall really have some
>>>>>> means to find out what mode it is in when it reboots, regardless if
>>>>>> cold or warm boot, since otherwise we are in trouble if the user is
>>>>>> booting from another OS that doesn't do the expected shutdown
>>>>>> sequence.
>>>>>
>>>>> This criticism makes a ton of sense. I'm sorry I missed it before.
>>>>> There
>>>>> were a lot of threads moving in parallel. However, I am curious. Given
>>>>> that the patch improves the situation for users (like me). Is there
>>>>> any
>>>>> way we can separate the redesign of the shutdown sequence and the UX
>>>>> improvement that comes with this patch?
>>>>>
>>>>> Here's my concern. I am happy to do the work to redesign this.
>>>>> However,
>>>>> I don't think I have the information needed to do this since I don't
>>>>> have access to the technical docs for the qca6390. I am worried
>>>>> that not
>>>>> accepting some form of this patch is letting perfect be the enemy
>>>>> of the
>>>>> good. And I am not sure how I personally can help with that. If you
>>>>> think it's possible for me to do this without the docs for the
>>>>> hardware,
>>>>> I am willing to give it a shot if I can get some guidance. Honestly, I
>>>>> wish I had the skill to be confident about a change like this, but
>>>>> I don't.
>>>>>
>>>>> Any ideas on how to move forward would be greatly appreciated.
>>>>>
>>>>> And just to be perfectly clear, I have tested this patch on my laptop.
>>>>> It greatly enhances my ability to use my hardware since I can
>>>>> reboot the
>>>>> machine without having to make sure to power cycle the laptop. This is
>>>>> not a theoretical improvement.
>>>>
>>>> I would really love some explanation why can't the driver know under
>>>> what mode the controller is when it gets probed, because to me we
>>>> cannot accept a driver that only works under certain condition after
>>>> the boot and in case it is really impossible, can't even power cycle
>>>> it to get it back to cold boot stage???
>>>
>>> This is a great technical criticism of the driver, and I think you
>>> deserve that explanation.
>>>
>>> However, with the driver already in the kernel, shouldn't the bias be
>>> toward mitigating the extremely bad UX and not hold users hostage for
>>> the bad design which has already been approved and landed in the kernel?
>>>
>>>> Also the criticism here should be directed to the vendor, how long
>>>> have we been discussing problems in the QCA driver? And the only thing
>>>> I see coming our way are work-arounds of the problems, the address not
>>>> being unique coming from the firmware itself and when provided via DT
>>>> the address is in the wrong byteorder and now that the driver must
>>>> communicate the firmware on shutdown in order to get it working
>>>> properly on the next boot.
>>>
>>> I agree that Qualcomm should get flack for this, however, the UX problem
>>> can be mitigated with a logic fix that doesn't make the init/shutdown
>>> design problem any worse than it currently seems to be. I mean, wouldn't
>>> this logic have to exist somewhere even if it weren't the shutdown path?
>>>
>>> If you are trying to use this as leverage to get Qualcomm to do a bigger
>>> thing (redesign the init/shutdown logic), I do think that tactic
>>> needlessly puts users in the crossfire. I can totally understand why
>>> you'd do it. I am just suffering the crossfire in the meantime, and it
>>> doesn't feel great.
>>
>> So you prefer to risk getting a kernel crash due to UAF over Bluetooth
>> not working? Really? Because I haven't seen any configuration that
>> those changes you tested don't reintroduce the UAF, which is why I
>> haven't applied that change to begin with, so no I'm not holding back
>> to pressure Qualcomm to redesign the shutdown logic, it just these
>> things got entangled because I just realized the shutdown thingy is
>> really out of place, imo, but I'd be fine if there is a temporary fix
>> until someone finally decide to spend some time to really fix the
>> shutdown logic.
> 
> Luiz, I'm sorry. I do not want a crash instead. I didn't understand that
> the solution I proposed (i.e. adding Zijun's logic without removing KK's
> logic) would introduce a new crash opportunity. I previously thought you
> were saying one of the following things:
> 1. The crash opportunity already existed due the init/shudown sequences.
> 2. The crash opportunity already existed due the init/shudown sequences
> when removing KK's logic.
> 
> If it was #1, I was hoping that adding the logic would make the risk no
> worse.
> 
> If it was #2, I was hoping that my suggestion of adding Zijun's logic
> without removing KK's logic might represent an acceptable middleground
> for a temporary fix that would "correct" the logic without introducing a
> new crash opportunity.
> 
> I feel like I may not be clear about what I mean by adding Zijun's logic
> and not removing KK's logic. Maybe something like this diff:
> 
> diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
> index 2f7ae38d85eb..fcac44ae7898 100644
> --- a/drivers/bluetooth/hci_qca.c
> +++ b/drivers/bluetooth/hci_qca.c
> @@ -2456,6 +2456,10 @@ static void qca_serdev_shutdown(struct device *dev)
>                      !test_bit(HCI_RUNNING, &hdev->flags))
>                          return;
> 
> +               if (test_bit(HCI_QUIRK_NON_PERSISTENT_SETUP,
> &hdev->quirks) ||
> +                   hci_dev_test_flag(hdev, HCI_SETUP))
> +                       return;
> +
>                  serdev_device_write_flush(serdev);
>                  ret = serdev_device_write_buf(serdev, ibs_wake_cmd,
>                                                sizeof(ibs_wake_cmd));
> 
> I think this diff is mangled due to using Thunderbird, but I hope this
> helps convey what I was asking about.
> 
> If I am understanding you correctly now, you are saying that simply
> introducing Zijun's logic (without removing KK's logic) will introduce a
> new crash opportunity. Is that correct?
> 

as Zijun declared. i believe Zijun's change will solve both this
reported regression issue and the use-after-free(crash) issue.

> Thanks,
> wt





[Index of Archives]     [Bluez Devel]     [Linux Wireless Networking]     [Linux Wireless Personal Area Networking]     [Linux ATH6KL]     [Linux USB Devel]     [Linux Media Drivers]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Big List of Linux Books]

  Powered by Linux