Hi Wren, On Mon, May 13, 2024 at 4:13 PM Wren Turkal <wt@xxxxxxxxxxxxxxxx> wrote: > > On 5/10/24 11:25 PM, Lk Sii wrote: > > On 2024/5/11 07:33, Wren Turkal wrote: > >> On 5/10/24 2:25 PM, Luiz Augusto von Dentz wrote: > >>> Hi Wren, > >>> > >>> On Fri, May 10, 2024 at 4:54 PM Wren Turkal <wt@xxxxxxxxxxxxxxxx> wrote: > >>>> > >>>> On 5/10/24 12:48 PM, Luiz Augusto von Dentz wrote: > >>>>> Hi Wren, > >>>>> > >>>>> On Fri, May 10, 2024 at 3:14 PM Wren Turkal <wt@xxxxxxxxxxxxxxxx> > >>>>> wrote: > >>>>>> > >>>>>> On 5/6/24 12:49 PM, Luiz Augusto von Dentz wrote: > >>>>>>> Hi Wren, > >>>>>>> > >>>>>>> On Mon, May 6, 2024 at 3:24 PM Wren Turkal <wt@xxxxxxxxxxxxxxxx> > >>>>>>> wrote: > >>>>>>>> > >>>>>>>> Krzysztof, > >>>>>>>> > >>>>>>>> I am reaching out to you as you had the most important objections > >>>>>>>> to the > >>>>>>>> change to fix qca6390 for the warm boot/module reload bug that I am > >>>>>>>> experiencing. > >>>>>>>> > >>>>>>>> For context, the problem is that the hci_uart module will send > >>>>>>>> specific > >>>>>>>> vendor specfic commands during shutdown of the hardware under most > >>>>>>>> situations. These VSCs put the bluetooth device into a > >>>>>>>> non-working state > >>>>>>>> on my Dell XPS 13 9310 with qca6390 bluetooth hardware. > >>>>>>>> > >>>>>>>> Zijun's proposed fix is to not send these commands when it's not > >>>>>>>> appropriate for the hardware. The vendor commands should be > >>>>>>>> avoided when > >>>>>>>> the hardware does not have persistent configuration or when the > >>>>>>>> device > >>>>>>>> is in setup state (indicating that is has never been setup and > >>>>>>>> should > >>>>>>>> not be sent the VSCs on the shutdown path). This is what Zijun's > >>>>>>>> patch > >>>>>>>> implements. > >>>>>>>> > >>>>>>>> In addition, Zijun's change removes the influence of both > >>>>>>>> the QCA_BT_OFF qca flag and and HCI_RUNNING hdev flag. Zijun asserts > >>>>>>>> that those flags should not influence the sending of the VSCs in the > >>>>>>>> shutdown path. If I understand KK's objections properly, this is > >>>>>>>> where > >>>>>>>> his objection is stemming from. KK, is this correct? > >>>>>>>> > >>>>>>>> Zijun's proposed fix can be seen here: > >>>>>>>> https://patchwork.kernel.org/project/bluetooth/patch/1713932807-19619-3-git-send-email-quic_zijuhu@xxxxxxxxxxx/ > >>>>>>>> > >>>>>>>> I'm wondering if we can resolve this impasse by splitting the change > >>>>>>>> into two changes, as follows: > >>>>>>>> > >>>>>>>> 1. Change that removes the influence of the QCA_BT_OFF and > >>>>>>>> HCI_RUNNING > >>>>>>>> flags in the shutdown path. > >>>>>>>> 2. Add the quirk from Zijun's patch that fixes my hardward > >>>>>>>> configuration. > >>>>>>>> > >>>>>>>> I'm hoping that better clearer descriptions for #1 can help get that > >>>>>>>> landed since the logic current appears to be at odds with how the > >>>>>>>> hardware works. > >>>>>>>> > >>>>>>>> Also, I am happy to split the patches into the two patches, or > >>>>>>>> (maybe > >>>>>>>> more ideally) just modify the commit message to better indicate the > >>>>>>>> reason the change. I just need guidance from maintainers so that > >>>>>>>> whatever work I do leads to something acceptable for y'all. > >>>>>>>> > >>>>>>>> So, please help me get this done. I am just a user with broken > >>>>>>>> hardware > >>>>>>>> and a fondness for Linux. I would love to help do what's needed > >>>>>>>> to get > >>>>>>>> this fix landed. > >>>>>>> > >>>>>>> Ive also objected to that change, in fact the whole shutdown sequence > >>>>>>> is sort of bogus in my opinion and the driver shall really have some > >>>>>>> means to find out what mode it is in when it reboots, regardless if > >>>>>>> cold or warm boot, since otherwise we are in trouble if the user is > >>>>>>> booting from another OS that doesn't do the expected shutdown > >>>>>>> sequence. > >>>>>> > >>>>>> This criticism makes a ton of sense. I'm sorry I missed it before. > >>>>>> There > >>>>>> were a lot of threads moving in parallel. However, I am curious. Given > >>>>>> that the patch improves the situation for users (like me). Is there > >>>>>> any > >>>>>> way we can separate the redesign of the shutdown sequence and the UX > >>>>>> improvement that comes with this patch? > >>>>>> > >>>>>> Here's my concern. I am happy to do the work to redesign this. > >>>>>> However, > >>>>>> I don't think I have the information needed to do this since I don't > >>>>>> have access to the technical docs for the qca6390. I am worried > >>>>>> that not > >>>>>> accepting some form of this patch is letting perfect be the enemy > >>>>>> of the > >>>>>> good. And I am not sure how I personally can help with that. If you > >>>>>> think it's possible for me to do this without the docs for the > >>>>>> hardware, > >>>>>> I am willing to give it a shot if I can get some guidance. Honestly, I > >>>>>> wish I had the skill to be confident about a change like this, but > >>>>>> I don't. > >>>>>> > >>>>>> Any ideas on how to move forward would be greatly appreciated. > >>>>>> > >>>>>> And just to be perfectly clear, I have tested this patch on my laptop. > >>>>>> It greatly enhances my ability to use my hardware since I can > >>>>>> reboot the > >>>>>> machine without having to make sure to power cycle the laptop. This is > >>>>>> not a theoretical improvement. > >>>>> > >>>>> I would really love some explanation why can't the driver know under > >>>>> what mode the controller is when it gets probed, because to me we > >>>>> cannot accept a driver that only works under certain condition after > >>>>> the boot and in case it is really impossible, can't even power cycle > >>>>> it to get it back to cold boot stage??? > >>>> > >>>> This is a great technical criticism of the driver, and I think you > >>>> deserve that explanation. > >>>> > >>>> However, with the driver already in the kernel, shouldn't the bias be > >>>> toward mitigating the extremely bad UX and not hold users hostage for > >>>> the bad design which has already been approved and landed in the kernel? > >>>> > >>>>> Also the criticism here should be directed to the vendor, how long > >>>>> have we been discussing problems in the QCA driver? And the only thing > >>>>> I see coming our way are work-arounds of the problems, the address not > >>>>> being unique coming from the firmware itself and when provided via DT > >>>>> the address is in the wrong byteorder and now that the driver must > >>>>> communicate the firmware on shutdown in order to get it working > >>>>> properly on the next boot. > >>>> > >>>> I agree that Qualcomm should get flack for this, however, the UX problem > >>>> can be mitigated with a logic fix that doesn't make the init/shutdown > >>>> design problem any worse than it currently seems to be. I mean, wouldn't > >>>> this logic have to exist somewhere even if it weren't the shutdown path? > >>>> > >>>> If you are trying to use this as leverage to get Qualcomm to do a bigger > >>>> thing (redesign the init/shutdown logic), I do think that tactic > >>>> needlessly puts users in the crossfire. I can totally understand why > >>>> you'd do it. I am just suffering the crossfire in the meantime, and it > >>>> doesn't feel great. > >>> > >>> So you prefer to risk getting a kernel crash due to UAF over Bluetooth > >>> not working? Really? Because I haven't seen any configuration that > >>> those changes you tested don't reintroduce the UAF, which is why I > >>> haven't applied that change to begin with, so no I'm not holding back > >>> to pressure Qualcomm to redesign the shutdown logic, it just these > >>> things got entangled because I just realized the shutdown thingy is > >>> really out of place, imo, but I'd be fine if there is a temporary fix > >>> until someone finally decide to spend some time to really fix the > >>> shutdown logic. > >> > >> Luiz, I'm sorry. I do not want a crash instead. I didn't understand that > >> the solution I proposed (i.e. adding Zijun's logic without removing KK's > >> logic) would introduce a new crash opportunity. I previously thought you > >> were saying one of the following things: > >> 1. The crash opportunity already existed due the init/shudown sequences. > >> 2. The crash opportunity already existed due the init/shudown sequences > >> when removing KK's logic. > >> > >> If it was #1, I was hoping that adding the logic would make the risk no > >> worse. > >> > >> If it was #2, I was hoping that my suggestion of adding Zijun's logic > >> without removing KK's logic might represent an acceptable middleground > >> for a temporary fix that would "correct" the logic without introducing a > >> new crash opportunity. > >> > >> I feel like I may not be clear about what I mean by adding Zijun's logic > >> and not removing KK's logic. Maybe something like this diff: > >> > >> diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c > >> index 2f7ae38d85eb..fcac44ae7898 100644 > >> --- a/drivers/bluetooth/hci_qca.c > >> +++ b/drivers/bluetooth/hci_qca.c > >> @@ -2456,6 +2456,10 @@ static void qca_serdev_shutdown(struct device *dev) > >> !test_bit(HCI_RUNNING, &hdev->flags)) > >> return; > >> > >> + if (test_bit(HCI_QUIRK_NON_PERSISTENT_SETUP, > >> &hdev->quirks) || > >> + hci_dev_test_flag(hdev, HCI_SETUP)) > >> + return; > >> + > >> serdev_device_write_flush(serdev); > >> ret = serdev_device_write_buf(serdev, ibs_wake_cmd, > >> sizeof(ibs_wake_cmd)); > >> > >> I think this diff is mangled due to using Thunderbird, but I hope this > >> helps convey what I was asking about. > >> > >> If I am understanding you correctly now, you are saying that simply > >> introducing Zijun's logic (without removing KK's logic) will introduce a > >> new crash opportunity. Is that correct? > >> > > > > as Zijun declared. i believe Zijun's change will solve both this > > reported regression issue and the use-after-free(crash) issue. > > I did see Zijun's claim of that. However, I think that both KK and Luiz > are not convinced by the explanation. Also, if that explanation does > convince KK and Luiz, I think that the explanation needs to be added to > the commit message. > > I'm hoping that Luiz will at least respond to the middleground I > proposed as a workaround. I recall suggesting using HCI_UART_PROTO_READY instead since that tells when serdev_device_close has been run: diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c index 0c9c9ee56592..bbbc86d4932a 100644 --- a/drivers/bluetooth/hci_qca.c +++ b/drivers/bluetooth/hci_qca.c @@ -2455,8 +2455,8 @@ static void qca_serdev_shutdown(struct device *dev) const u8 edl_reset_soc_cmd[] = { 0x01, 0x00, 0xFC, 0x01, 0x05 }; if (qcadev->btsoc_type == QCA_QCA6390) { - if (test_bit(QCA_BT_OFF, &qca->flags) || - !test_bit(HCI_RUNNING, &hdev->flags)) + /* Check if serdev_device_close() has already been called. */ + if (!test_bit(HCI_UART_PROTO_READY, &hu->flags)) return; serdev_device_write_flush(serdev); -- Luiz Augusto von Dentz