Re: [PATCH v2] Bluetooth: qca: Fix BT enable failure again for QCA6390 after warm reboot

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2024/5/22 22:25, Luiz Augusto von Dentz wrote:
> Hi,
> 
> On Wed, May 22, 2024 at 9:33 AM Lk Sii <lk_sii@xxxxxxx> wrote:
>>
>>
>>
>> On 2024/5/21 23:48, Luiz Augusto von Dentz wrote:
>>> Hi,
>>>
>>> On Tue, May 21, 2024 at 10:52 AM Lk Sii <lk_sii@xxxxxxx> wrote:
>>>>
>>>>
>>>>
>>>> On 2024/5/16 23:55, Luiz Augusto von Dentz wrote:
>>>>> Hi,
>>>>>
>>>>> On Thu, May 16, 2024 at 10:57 AM Lk Sii <lk_sii@xxxxxxx> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2024/5/16 21:31, Zijun Hu wrote:
>>>>>>> Commit 272970be3dab ("Bluetooth: hci_qca: Fix driver shutdown on closed
>>>>>>> serdev") will cause below regression issue:
>>>>>>>
>>>>>>> BT can't be enabled after below steps:
>>>>>>> cold boot -> enable BT -> disable BT -> warm reboot -> BT enable failure
>>>>>>> if property enable-gpios is not configured within DT|ACPI for QCA6390.
>>>>>>>
>>>>>>> The commit is to fix a use-after-free issue within qca_serdev_shutdown()
>>>>>>> by adding condition to avoid the serdev is flushed or wrote after closed
>>>>>>> but also introduces this regression issue regarding above steps since the
>>>>>>> VSC is not sent to reset controller during warm reboot.
>>>>>>>
>>>>>>> Fixed by sending the VSC to reset controller within qca_serdev_shutdown()
>>>>>>> once BT was ever enabled, and the use-after-free issue is also fixed by
>>>>>>> this change since the serdev is still opened before it is flushed or wrote.
>>>>>>>
>>>>>>> Verified by the reported machine Dell XPS 13 9310 laptop over below two
>>>>>>> kernel commits:
>>>>>>> commit e00fc2700a3f ("Bluetooth: btusb: Fix triggering coredump
>>>>>>> implementation for QCA") of bluetooth-next tree.
>>>>>>> commit b23d98d46d28 ("Bluetooth: btusb: Fix triggering coredump
>>>>>>> implementation for QCA") of linus mainline tree.
>>>>>>>
>>>>>>> Fixes: 272970be3dab ("Bluetooth: hci_qca: Fix driver shutdown on closed serdev")
>>>>>>> Cc: stable@xxxxxxxxxxxxxxx
>>>>>>> Reported-by: Wren Turkal <wt@xxxxxxxxxxxxxxxx>
>>>>>>> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218726
>>>>>>> Signed-off-by: Zijun Hu <quic_zijuhu@xxxxxxxxxxx>
>>>>>>> Tested-by: Wren Turkal <wt@xxxxxxxxxxxxxxxx>
>>>>>>> ---
>>>>>>> V1 -> V2: Add comments and more commit messages
>>>>>>>
>>>>>>> V1 discussion link:
>>>>>>> https://lore.kernel.org/linux-bluetooth/d553edef-c1a4-4d52-a892-715549d31ebe@xxxxxxx/T/#t
>>>>>>>
>>>>>>>  drivers/bluetooth/hci_qca.c | 18 +++++++++++++++---
>>>>>>>  1 file changed, 15 insertions(+), 3 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
>>>>>>> index 0c9c9ee56592..9a0bc86f9aac 100644
>>>>>>> --- a/drivers/bluetooth/hci_qca.c
>>>>>>> +++ b/drivers/bluetooth/hci_qca.c
>>>>>>> @@ -2450,15 +2450,27 @@ static void qca_serdev_shutdown(struct device *dev)
>>>>>>>       struct qca_serdev *qcadev = serdev_device_get_drvdata(serdev);
>>>>>>>       struct hci_uart *hu = &qcadev->serdev_hu;
>>>>>>>       struct hci_dev *hdev = hu->hdev;
>>>>>>> -     struct qca_data *qca = hu->priv;
>>>>>>>       const u8 ibs_wake_cmd[] = { 0xFD };
>>>>>>>       const u8 edl_reset_soc_cmd[] = { 0x01, 0x00, 0xFC, 0x01, 0x05 };
>>>>>>>
>>>>>>>       if (qcadev->btsoc_type == QCA_QCA6390) {
>>>>>>> -             if (test_bit(QCA_BT_OFF, &qca->flags) ||
>>>>>>> -                 !test_bit(HCI_RUNNING, &hdev->flags))
>>>>>>> +             /* The purpose of sending the VSC is to reset SOC into a initial
>>>>>>> +              * state and the state will ensure next hdev->setup() success.
>>>>>>> +              * if HCI_QUIRK_NON_PERSISTENT_SETUP is set, it means that
>>>>>>> +              * hdev->setup() can do its job regardless of SoC state, so
>>>>>>> +              * don't need to send the VSC.
>>>>>>> +              * if HCI_SETUP is set, it means that hdev->setup() was never
>>>>>>> +              * invoked and the SOC is already in the initial state, so
>>>>>>> +              * don't also need to send the VSC.
>>>>>>> +              */
>>>>>>> +             if (test_bit(HCI_QUIRK_NON_PERSISTENT_SETUP, &hdev->quirks) ||
>>>>>>> +                 hci_dev_test_flag(hdev, HCI_SETUP))
>>>>>>>                       return;
>>>> The main purpose of above checking is NOT to make sure the serdev within
>>>> open state as its comments explained.
>>>>>>>
>>>>>>> +             /* The serdev must be in open state when conrol logic arrives
>>>>>>> +              * here, so also fix the use-after-free issue caused by that
>>>>>>> +              * the serdev is flushed or wrote after it is closed.
>>>>>>> +              */
>>>>>>>               serdev_device_write_flush(serdev);
>>>>>>>               ret = serdev_device_write_buf(serdev, ibs_wake_cmd,
>>>>>>>                                             sizeof(ibs_wake_cmd));
>>>>>> i believe Zijun's change is able to fix both below issues and don't
>>>>>> introduce new issue.
>>>>>>
>>>>>> regression issue A:  BT enable failure after warm reboot.
>>>>>> issue B:  use-after-free issue, namely, kernel crash.
>>>>>>
>>>>>>
>>>>>> For issue B, i have more findings related to below commits ordered by time.
>>>>>>
>>>>>> Commit A: 7e7bbddd029b ("Bluetooth: hci_qca: Fix qca6390 enable failure
>>>>>> after warm reboot")
>>>>>>
>>>>>> Commit B: de8892df72be ("Bluetooth: hci_serdev: Close UART port if
>>>>>> NON_PERSISTENT_SETUP is set")
>>>>>> this commit introduces issue B, it is also not suitable to associate
>>>>>> protocol state with state of lower level transport type such as serdev
>>>>>> or uart, in my opinion, protocol state should be independent with
>>>>>> transport type state, flag HCI_UART_PROTO_READY is for protocol state,
>>>>>> it means if protocol hu->proto is initialized and if we can invoke its
>>>>>> interfaces.it is common for various kinds of transport types. perhaps,
>>>>>> this is the reason why Zijun's change doesn't use flag HCI_UART_PROTO_READY.
>>>>>
>>>>> Don't really follow you here, if HCI_UART_PROTO_READY indicates the
>>>>> protocol state they is even _more_ important to use before invoking
>>>>> serdev APIs, so checking for the quirk sound like a problem because:
>>>>>
>>>>> [1] hci_uart_close
>>>>>      /* When QUIRK HCI_QUIRK_NON_PERSISTENT_SETUP is set by driver,
>>>>>      * BT SOC is completely powered OFF during BT OFF, holding port
>>>>>      * open may drain the battery.
>>>>>      */
>>>>>     if (test_bit(HCI_QUIRK_NON_PERSISTENT_SETUP, &hdev->quirks)) {
>>>>>         clear_bit(HCI_UART_PROTO_READY, &hu->flags);
>>>>>         serdev_device_close(hu->serdev);
>>>>>     }
>>>>>
>>>>> [2] hci_uart_unregister_device
>>>>>     if (test_bit(HCI_UART_PROTO_READY, &hu->flags)) {
>>>>>         clear_bit(HCI_UART_PROTO_READY, &hu->flags);
>>>>>         serdev_device_close(hu->serdev);
>>>>>     }
>>>>> both case 1 and case 2 were introduced by Commit B in question which
>>>> uses protocol state flag HCI_UART_PROTO_READY to track lower level
>>>> transport type state, i don't think it is perfect.
>>>>
>>>> for common files hci_serdev.c and hci_ldisc.c, as you saw, the purpose
>>>> of checking HCI_UART_PROTO_READY is to call protocol relevant
>>>> interfaces, moreover, these protocol relevant interfaces do not deal
>>>> with lower transport state. you maybe even notice below present function
>>>> within which lower level serdev is flushed before HCI_UART_PROTO_READY
>>>> is checked:
>>>>
>>>> static int hci_uart_flush(struct hci_dev *hdev)
>>>> {
>>>> ......
>>>>         /* Flush any pending characters in the driver and discipline. */
>>>>         serdev_device_write_flush(hu->serdev);
>>>>
>>>>         if (test_bit(HCI_UART_PROTO_READY, &hu->flags))
>>>>                 hu->proto->flush(hu);
>>>>
>>>>         return 0;
>>>> }
>>>>
>>>> in my opinion, that is why qca_serdev_shutdown() does not check
>>>> HCI_UART_PROTO_READY for later lower level serdev operations.
>>>>> So only in case 1 checking the quirk is equivalent to
>>>>> HCI_UART_PROTO_READY on case 2 it does actually check the quirk and
>>>>> will proceed to call serdev_device_close, now perhaps the code is
>>>>> assuming that shutdown won't be called after that, but it looks it
>>>>> does since:
>>>>>
>>>> qca_serdev_shutdown() will never be called after case 2 as explained
>>>> in the end.
>>>>> static void serdev_drv_remove(struct device *dev)
>>>>> {
>>>>>     const struct serdev_device_driver *sdrv =
>>>>> to_serdev_device_driver(dev->driver);
>>>>>     if (sdrv->remove)
>>>>>         sdrv->remove(to_serdev_device(dev));
>>>>>
>>>>>     dev_pm_domain_detach(dev, true);
>>>>> }
>>>>>
>>>>> dev_pm_domain_detach says it will power off so I assume that means
>>>>> that shutdown will be called _after_ remove, so not I'm not really
>>>>> convinced that we can avoid using HCI_UART_PROTO_READY, in fact the
>>>>> following sequence might always be triggering:
>>>>>
>>>> dev_pm_domain_detach() should be irrelevant with qca_serdev_shutdown(),
>>>> should not trigger call of qca_serdev_shutdown() as explained in the end
>>>>> serdev_drv_remove -> qca_serdev_remove -> hci_uart_unregister_device
>>>>> -> serdev_device_close -> qca_close -> kfree(qca)
>>>>> dev_pm_domain_detach -> ??? -> qca_serdev_shutdown
>>>>>
>>>>> If this sequence is correct then qca_serdev_shutdown accessing
>>>>> qca_data will always result in a UAF problem.
>>>>>
>>>> above sequence should not correct as explained below.
>>>>
>>>> serdev and its driver should also follow below generic device and driver
>>>> design.
>>>>
>>>> 1)
>>>> driver->shutdown() will be called during shut-down time at this time
>>>> driver->remove() should not have been called.
>>>>
>>>> 2)
>>>> driver->shutdown() is impossible to be called once driver->remove()
>>>> was called.
>>>>
>>>> 3) for serdev, driver->remove() does not trigger call of
>>>> driver->shutdown() since PM relevant poweroff is irrelevant with
>>>> driver->shutdown() and i also don't find any PM relevant interfaces will
>>>> call driver->shutdown().
>>>>
>>>> i would like to explain issue B based on comments Zijun posted by public
>>>> as below:
>>>>
>>>> issue B actually happens during reboot and let me look at these steps
>>>> boot -> enable BT -> disable BT -> reboot.
>>>>
>>>> 1) step boot will call driver->probe() to register hdev and the serdev
>>>> is opened after boot.
>>>>
>>>> 2) step enable will call hdev->open() and the serdev will still in open
>>>> state
>>>>
>>>> 3) step disable will call hdev->close() and the serdev will be closed
>>>> after hdev->close() for machine with config which results in
>>>> HCI_QUIRK_NON_PERSISTENT_SETUP is set.
>>>>
>>>> 4) step reboot will call qca_serdev_shutdown() which will flush and
>>>> write the serdev which are closed by above step disable, so cause the
>>>> UAF issue, namely, kernel crash issue.
>>>>
>>>> so this issue is caused by commit B which close the serdev during
>>>> hdev->close().
>>>>
>>>> driver->remove() even is not triggered during above steps.
>>>>>> Commit C: 272970be3dab ("Bluetooth: hci_qca: Fix driver shutdown on
>>>>>> closed serdev")
>>>>>> this commit is to fix issue B which is actually caused by Commit B, but
>>>>>> it has Fixes tag for Commit A. and it also introduces the regression
>>>>>> issue A.
>>>>>>
>>>>>
>>>>>
>>>
>>> Reading again the commit message for the UAF fix it sounds like a
>>> different problem:
>>>
>> no, the UAF issue commit C fixes should be the same issue descripted by
>> me previously as explained below:
>>
>> the UAF issue happened with machine "qualcomm Technologies, Inc.
>> Robotics RB5 (DT)", the machine uses qca6390 and have property
>> enable-gpios configured, which will results in that quirk
>> HCI_QUIRK_NON_PERSISTENT_SETUP is set, so must meet the UAF issue
>> for normal operation sequences "boot -> enable BT -> disable BT -> reboot".
> 
> Wait, are you telling me that UAF was wrongly described? Or perhaps
> they are just different trees and you don't see the problem because
> you are not actually running with the mainline code?
> 
sorry, please ignore my last comment to simply discussion
>> Actually, only machines which uses QCA6390 and have property
>> enable-gpios configured will meet the UAF issue as commented by Zijun
>> with below link
>> https://lore.kernel.org/linux-bluetooth/9ac11453-b7cf-43f3-8e46-f96e41ef190d@xxxxxxxxxxx/
>>
>>>     The driver shutdown callback (which sends EDL_SOC_RESET to the device
>>>     over serdev) should not be invoked when HCI device is not open (e.g. if
>>>     hci_dev_open_sync() failed), because the serdev and its TTY are not open
>>>     either.  Also skip this step if device is powered off
>>>     (qca_power_shutdown()).
>>>
>>> So if hci_dev_open_sync has failed it says serdev and its TTY will not
>>> be open either, so I guess that's why HCI_SETUP was added as a
>>> condition to bail out? So it seems correct to do that although I'd
>>> change the comments.
>>>
>> i believe hci_dev_open_sync failure should not really happens with the
>> machine Robotics RB5, the purpose that it is mentioned with commit
>> message is to illustrate that the serdev in closed state is operated and
>> causes the UAF issue.
> 
> Sorry but Im not really following, there seems to be instances where
> qca driver will fail on hci_dev_open_sync, now the shutdown sequence
> is only done for a specific model, not a machine like Robotics RB5,
> btw you are not really contributing to solving the problem if you are
> throwing us back in different directions like this.
> 
sorry, please ignore my last comment to simply discussion
>> let us assume that hci_dev_open_sync failure -> serdev is not opened ->
>> UAF issue happens within qca_serdev_shutdown(), then BT will not be
>> working at all and the commit C is actually a workaroud instead of a fix
>> since the right approach is to solve the hci_dev_open_sync failure which
>> happens firstly.
>>
>>
>> Frankly, only checking quirk HCI_QUIRK_NON_PERSISTENT_SETUP is enough to
>> fix the UAF issue caused by either "normal operation sequences" or
>> "hci_dev_open_sync failure".
> 
> Ok, just stop contributing to this thread, you don't know what you are
> talking about, HCI_QUIRK_NON_PERSISTENT_SETUP has nothing to do with
> serdev state.
> 
okay, please ignore my last comment to simply discussion about Zijun's
change.
>>> @Krzysztof Kozlowski do you still have a test setup for 272970be3dab
>>> ("Bluetooth: hci_qca: Fix driver shutdown on closed serdev"), can you
>>> try with these changes?
>>>
>>
> 
> 





[Index of Archives]     [Bluez Devel]     [Linux Wireless Networking]     [Linux Wireless Personal Area Networking]     [Linux ATH6KL]     [Linux USB Devel]     [Linux Media Drivers]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Big List of Linux Books]

  Powered by Linux