Re: [PATCH v2] Bluetooth: qca: Fix BT enable failure again for QCA6390 after warm reboot

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2024/5/21 23:48, Luiz Augusto von Dentz wrote:
> Hi,
> 
> On Tue, May 21, 2024 at 10:52 AM Lk Sii <lk_sii@xxxxxxx> wrote:
>>
>>
>>
>> On 2024/5/16 23:55, Luiz Augusto von Dentz wrote:
>>> Hi,
>>>
>>> On Thu, May 16, 2024 at 10:57 AM Lk Sii <lk_sii@xxxxxxx> wrote:
>>>>
>>>>
>>>>
>>>> On 2024/5/16 21:31, Zijun Hu wrote:
>>>>> Commit 272970be3dab ("Bluetooth: hci_qca: Fix driver shutdown on closed
>>>>> serdev") will cause below regression issue:
>>>>>
>>>>> BT can't be enabled after below steps:
>>>>> cold boot -> enable BT -> disable BT -> warm reboot -> BT enable failure
>>>>> if property enable-gpios is not configured within DT|ACPI for QCA6390.
>>>>>
>>>>> The commit is to fix a use-after-free issue within qca_serdev_shutdown()
>>>>> by adding condition to avoid the serdev is flushed or wrote after closed
>>>>> but also introduces this regression issue regarding above steps since the
>>>>> VSC is not sent to reset controller during warm reboot.
>>>>>
>>>>> Fixed by sending the VSC to reset controller within qca_serdev_shutdown()
>>>>> once BT was ever enabled, and the use-after-free issue is also fixed by
>>>>> this change since the serdev is still opened before it is flushed or wrote.
>>>>>
>>>>> Verified by the reported machine Dell XPS 13 9310 laptop over below two
>>>>> kernel commits:
>>>>> commit e00fc2700a3f ("Bluetooth: btusb: Fix triggering coredump
>>>>> implementation for QCA") of bluetooth-next tree.
>>>>> commit b23d98d46d28 ("Bluetooth: btusb: Fix triggering coredump
>>>>> implementation for QCA") of linus mainline tree.
>>>>>
>>>>> Fixes: 272970be3dab ("Bluetooth: hci_qca: Fix driver shutdown on closed serdev")
>>>>> Cc: stable@xxxxxxxxxxxxxxx
>>>>> Reported-by: Wren Turkal <wt@xxxxxxxxxxxxxxxx>
>>>>> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218726
>>>>> Signed-off-by: Zijun Hu <quic_zijuhu@xxxxxxxxxxx>
>>>>> Tested-by: Wren Turkal <wt@xxxxxxxxxxxxxxxx>
>>>>> ---
>>>>> V1 -> V2: Add comments and more commit messages
>>>>>
>>>>> V1 discussion link:
>>>>> https://lore.kernel.org/linux-bluetooth/d553edef-c1a4-4d52-a892-715549d31ebe@xxxxxxx/T/#t
>>>>>
>>>>>  drivers/bluetooth/hci_qca.c | 18 +++++++++++++++---
>>>>>  1 file changed, 15 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
>>>>> index 0c9c9ee56592..9a0bc86f9aac 100644
>>>>> --- a/drivers/bluetooth/hci_qca.c
>>>>> +++ b/drivers/bluetooth/hci_qca.c
>>>>> @@ -2450,15 +2450,27 @@ static void qca_serdev_shutdown(struct device *dev)
>>>>>       struct qca_serdev *qcadev = serdev_device_get_drvdata(serdev);
>>>>>       struct hci_uart *hu = &qcadev->serdev_hu;
>>>>>       struct hci_dev *hdev = hu->hdev;
>>>>> -     struct qca_data *qca = hu->priv;
>>>>>       const u8 ibs_wake_cmd[] = { 0xFD };
>>>>>       const u8 edl_reset_soc_cmd[] = { 0x01, 0x00, 0xFC, 0x01, 0x05 };
>>>>>
>>>>>       if (qcadev->btsoc_type == QCA_QCA6390) {
>>>>> -             if (test_bit(QCA_BT_OFF, &qca->flags) ||
>>>>> -                 !test_bit(HCI_RUNNING, &hdev->flags))
>>>>> +             /* The purpose of sending the VSC is to reset SOC into a initial
>>>>> +              * state and the state will ensure next hdev->setup() success.
>>>>> +              * if HCI_QUIRK_NON_PERSISTENT_SETUP is set, it means that
>>>>> +              * hdev->setup() can do its job regardless of SoC state, so
>>>>> +              * don't need to send the VSC.
>>>>> +              * if HCI_SETUP is set, it means that hdev->setup() was never
>>>>> +              * invoked and the SOC is already in the initial state, so
>>>>> +              * don't also need to send the VSC.
>>>>> +              */
>>>>> +             if (test_bit(HCI_QUIRK_NON_PERSISTENT_SETUP, &hdev->quirks) ||
>>>>> +                 hci_dev_test_flag(hdev, HCI_SETUP))
>>>>>                       return;
>> The main purpose of above checking is NOT to make sure the serdev within
>> open state as its comments explained.
>>>>>
>>>>> +             /* The serdev must be in open state when conrol logic arrives
>>>>> +              * here, so also fix the use-after-free issue caused by that
>>>>> +              * the serdev is flushed or wrote after it is closed.
>>>>> +              */
>>>>>               serdev_device_write_flush(serdev);
>>>>>               ret = serdev_device_write_buf(serdev, ibs_wake_cmd,
>>>>>                                             sizeof(ibs_wake_cmd));
>>>> i believe Zijun's change is able to fix both below issues and don't
>>>> introduce new issue.
>>>>
>>>> regression issue A:  BT enable failure after warm reboot.
>>>> issue B:  use-after-free issue, namely, kernel crash.
>>>>
>>>>
>>>> For issue B, i have more findings related to below commits ordered by time.
>>>>
>>>> Commit A: 7e7bbddd029b ("Bluetooth: hci_qca: Fix qca6390 enable failure
>>>> after warm reboot")
>>>>
>>>> Commit B: de8892df72be ("Bluetooth: hci_serdev: Close UART port if
>>>> NON_PERSISTENT_SETUP is set")
>>>> this commit introduces issue B, it is also not suitable to associate
>>>> protocol state with state of lower level transport type such as serdev
>>>> or uart, in my opinion, protocol state should be independent with
>>>> transport type state, flag HCI_UART_PROTO_READY is for protocol state,
>>>> it means if protocol hu->proto is initialized and if we can invoke its
>>>> interfaces.it is common for various kinds of transport types. perhaps,
>>>> this is the reason why Zijun's change doesn't use flag HCI_UART_PROTO_READY.
>>>
>>> Don't really follow you here, if HCI_UART_PROTO_READY indicates the
>>> protocol state they is even _more_ important to use before invoking
>>> serdev APIs, so checking for the quirk sound like a problem because:
>>>
>>> [1] hci_uart_close
>>>      /* When QUIRK HCI_QUIRK_NON_PERSISTENT_SETUP is set by driver,
>>>      * BT SOC is completely powered OFF during BT OFF, holding port
>>>      * open may drain the battery.
>>>      */
>>>     if (test_bit(HCI_QUIRK_NON_PERSISTENT_SETUP, &hdev->quirks)) {
>>>         clear_bit(HCI_UART_PROTO_READY, &hu->flags);
>>>         serdev_device_close(hu->serdev);
>>>     }
>>>
>>> [2] hci_uart_unregister_device
>>>     if (test_bit(HCI_UART_PROTO_READY, &hu->flags)) {
>>>         clear_bit(HCI_UART_PROTO_READY, &hu->flags);
>>>         serdev_device_close(hu->serdev);
>>>     }
>>> both case 1 and case 2 were introduced by Commit B in question which
>> uses protocol state flag HCI_UART_PROTO_READY to track lower level
>> transport type state, i don't think it is perfect.
>>
>> for common files hci_serdev.c and hci_ldisc.c, as you saw, the purpose
>> of checking HCI_UART_PROTO_READY is to call protocol relevant
>> interfaces, moreover, these protocol relevant interfaces do not deal
>> with lower transport state. you maybe even notice below present function
>> within which lower level serdev is flushed before HCI_UART_PROTO_READY
>> is checked:
>>
>> static int hci_uart_flush(struct hci_dev *hdev)
>> {
>> ......
>>         /* Flush any pending characters in the driver and discipline. */
>>         serdev_device_write_flush(hu->serdev);
>>
>>         if (test_bit(HCI_UART_PROTO_READY, &hu->flags))
>>                 hu->proto->flush(hu);
>>
>>         return 0;
>> }
>>
>> in my opinion, that is why qca_serdev_shutdown() does not check
>> HCI_UART_PROTO_READY for later lower level serdev operations.
>>> So only in case 1 checking the quirk is equivalent to
>>> HCI_UART_PROTO_READY on case 2 it does actually check the quirk and
>>> will proceed to call serdev_device_close, now perhaps the code is
>>> assuming that shutdown won't be called after that, but it looks it
>>> does since:
>>>
>> qca_serdev_shutdown() will never be called after case 2 as explained
>> in the end.
>>> static void serdev_drv_remove(struct device *dev)
>>> {
>>>     const struct serdev_device_driver *sdrv =
>>> to_serdev_device_driver(dev->driver);
>>>     if (sdrv->remove)
>>>         sdrv->remove(to_serdev_device(dev));
>>>
>>>     dev_pm_domain_detach(dev, true);
>>> }
>>>
>>> dev_pm_domain_detach says it will power off so I assume that means
>>> that shutdown will be called _after_ remove, so not I'm not really
>>> convinced that we can avoid using HCI_UART_PROTO_READY, in fact the
>>> following sequence might always be triggering:
>>>
>> dev_pm_domain_detach() should be irrelevant with qca_serdev_shutdown(),
>> should not trigger call of qca_serdev_shutdown() as explained in the end
>>> serdev_drv_remove -> qca_serdev_remove -> hci_uart_unregister_device
>>> -> serdev_device_close -> qca_close -> kfree(qca)
>>> dev_pm_domain_detach -> ??? -> qca_serdev_shutdown
>>>
>>> If this sequence is correct then qca_serdev_shutdown accessing
>>> qca_data will always result in a UAF problem.
>>>
>> above sequence should not correct as explained below.
>>
>> serdev and its driver should also follow below generic device and driver
>> design.
>>
>> 1)
>> driver->shutdown() will be called during shut-down time at this time
>> driver->remove() should not have been called.
>>
>> 2)
>> driver->shutdown() is impossible to be called once driver->remove()
>> was called.
>>
>> 3) for serdev, driver->remove() does not trigger call of
>> driver->shutdown() since PM relevant poweroff is irrelevant with
>> driver->shutdown() and i also don't find any PM relevant interfaces will
>> call driver->shutdown().
>>
>> i would like to explain issue B based on comments Zijun posted by public
>> as below:
>>
>> issue B actually happens during reboot and let me look at these steps
>> boot -> enable BT -> disable BT -> reboot.
>>
>> 1) step boot will call driver->probe() to register hdev and the serdev
>> is opened after boot.
>>
>> 2) step enable will call hdev->open() and the serdev will still in open
>> state
>>
>> 3) step disable will call hdev->close() and the serdev will be closed
>> after hdev->close() for machine with config which results in
>> HCI_QUIRK_NON_PERSISTENT_SETUP is set.
>>
>> 4) step reboot will call qca_serdev_shutdown() which will flush and
>> write the serdev which are closed by above step disable, so cause the
>> UAF issue, namely, kernel crash issue.
>>
>> so this issue is caused by commit B which close the serdev during
>> hdev->close().
>>
>> driver->remove() even is not triggered during above steps.
>>>> Commit C: 272970be3dab ("Bluetooth: hci_qca: Fix driver shutdown on
>>>> closed serdev")
>>>> this commit is to fix issue B which is actually caused by Commit B, but
>>>> it has Fixes tag for Commit A. and it also introduces the regression
>>>> issue A.
>>>>
>>>
>>>
> 
> Reading again the commit message for the UAF fix it sounds like a
> different problem:
> 
>     The driver shutdown callback (which sends EDL_SOC_RESET to the device
>     over serdev) should not be invoked when HCI device is not open (e.g. if
>     hci_dev_open_sync() failed), because the serdev and its TTY are not open
>     either.  Also skip this step if device is powered off
>     (qca_power_shutdown()).
> 
> So if hci_dev_open_sync has failed it says serdev and its TTY will not
> be open either, so I guess that's why HCI_SETUP was added as a
> condition to bail out? So it seems correct to do that although I'd
> change the comments.
> 
yes, agree with you on these points, Zijun's change is able to fix this
different problem as well.
> @Krzysztof Kozlowski do you still have a test setup for 272970be3dab
> ("Bluetooth: hci_qca: Fix driver shutdown on closed serdev"), can you
> try with these changes?
> 





[Index of Archives]     [Bluez Devel]     [Linux Wireless Networking]     [Linux Wireless Personal Area Networking]     [Linux ATH6KL]     [Linux USB Devel]     [Linux Media Drivers]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Big List of Linux Books]

  Powered by Linux