On 2024/5/21 23:48, Luiz Augusto von Dentz wrote: > Hi, > > On Tue, May 21, 2024 at 10:52 AM Lk Sii <lk_sii@xxxxxxx> wrote: >> >> >> >> On 2024/5/16 23:55, Luiz Augusto von Dentz wrote: >>> Hi, >>> >>> On Thu, May 16, 2024 at 10:57 AM Lk Sii <lk_sii@xxxxxxx> wrote: >>>> >>>> >>>> >>>> On 2024/5/16 21:31, Zijun Hu wrote: >>>>> Commit 272970be3dab ("Bluetooth: hci_qca: Fix driver shutdown on closed >>>>> serdev") will cause below regression issue: >>>>> >>>>> BT can't be enabled after below steps: >>>>> cold boot -> enable BT -> disable BT -> warm reboot -> BT enable failure >>>>> if property enable-gpios is not configured within DT|ACPI for QCA6390. >>>>> >>>>> The commit is to fix a use-after-free issue within qca_serdev_shutdown() >>>>> by adding condition to avoid the serdev is flushed or wrote after closed >>>>> but also introduces this regression issue regarding above steps since the >>>>> VSC is not sent to reset controller during warm reboot. >>>>> >>>>> Fixed by sending the VSC to reset controller within qca_serdev_shutdown() >>>>> once BT was ever enabled, and the use-after-free issue is also fixed by >>>>> this change since the serdev is still opened before it is flushed or wrote. >>>>> >>>>> Verified by the reported machine Dell XPS 13 9310 laptop over below two >>>>> kernel commits: >>>>> commit e00fc2700a3f ("Bluetooth: btusb: Fix triggering coredump >>>>> implementation for QCA") of bluetooth-next tree. >>>>> commit b23d98d46d28 ("Bluetooth: btusb: Fix triggering coredump >>>>> implementation for QCA") of linus mainline tree. >>>>> >>>>> Fixes: 272970be3dab ("Bluetooth: hci_qca: Fix driver shutdown on closed serdev") >>>>> Cc: stable@xxxxxxxxxxxxxxx >>>>> Reported-by: Wren Turkal <wt@xxxxxxxxxxxxxxxx> >>>>> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218726 >>>>> Signed-off-by: Zijun Hu <quic_zijuhu@xxxxxxxxxxx> >>>>> Tested-by: Wren Turkal <wt@xxxxxxxxxxxxxxxx> >>>>> --- >>>>> V1 -> V2: Add comments and more commit messages >>>>> >>>>> V1 discussion link: >>>>> https://lore.kernel.org/linux-bluetooth/d553edef-c1a4-4d52-a892-715549d31ebe@xxxxxxx/T/#t >>>>> >>>>> drivers/bluetooth/hci_qca.c | 18 +++++++++++++++--- >>>>> 1 file changed, 15 insertions(+), 3 deletions(-) >>>>> >>>>> diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c >>>>> index 0c9c9ee56592..9a0bc86f9aac 100644 >>>>> --- a/drivers/bluetooth/hci_qca.c >>>>> +++ b/drivers/bluetooth/hci_qca.c >>>>> @@ -2450,15 +2450,27 @@ static void qca_serdev_shutdown(struct device *dev) >>>>> struct qca_serdev *qcadev = serdev_device_get_drvdata(serdev); >>>>> struct hci_uart *hu = &qcadev->serdev_hu; >>>>> struct hci_dev *hdev = hu->hdev; >>>>> - struct qca_data *qca = hu->priv; >>>>> const u8 ibs_wake_cmd[] = { 0xFD }; >>>>> const u8 edl_reset_soc_cmd[] = { 0x01, 0x00, 0xFC, 0x01, 0x05 }; >>>>> >>>>> if (qcadev->btsoc_type == QCA_QCA6390) { >>>>> - if (test_bit(QCA_BT_OFF, &qca->flags) || >>>>> - !test_bit(HCI_RUNNING, &hdev->flags)) >>>>> + /* The purpose of sending the VSC is to reset SOC into a initial >>>>> + * state and the state will ensure next hdev->setup() success. >>>>> + * if HCI_QUIRK_NON_PERSISTENT_SETUP is set, it means that >>>>> + * hdev->setup() can do its job regardless of SoC state, so >>>>> + * don't need to send the VSC. >>>>> + * if HCI_SETUP is set, it means that hdev->setup() was never >>>>> + * invoked and the SOC is already in the initial state, so >>>>> + * don't also need to send the VSC. >>>>> + */ >>>>> + if (test_bit(HCI_QUIRK_NON_PERSISTENT_SETUP, &hdev->quirks) || >>>>> + hci_dev_test_flag(hdev, HCI_SETUP)) >>>>> return; >> The main purpose of above checking is NOT to make sure the serdev within >> open state as its comments explained. >>>>> >>>>> + /* The serdev must be in open state when conrol logic arrives >>>>> + * here, so also fix the use-after-free issue caused by that >>>>> + * the serdev is flushed or wrote after it is closed. >>>>> + */ >>>>> serdev_device_write_flush(serdev); >>>>> ret = serdev_device_write_buf(serdev, ibs_wake_cmd, >>>>> sizeof(ibs_wake_cmd)); >>>> i believe Zijun's change is able to fix both below issues and don't >>>> introduce new issue. >>>> >>>> regression issue A: BT enable failure after warm reboot. >>>> issue B: use-after-free issue, namely, kernel crash. >>>> >>>> >>>> For issue B, i have more findings related to below commits ordered by time. >>>> >>>> Commit A: 7e7bbddd029b ("Bluetooth: hci_qca: Fix qca6390 enable failure >>>> after warm reboot") >>>> >>>> Commit B: de8892df72be ("Bluetooth: hci_serdev: Close UART port if >>>> NON_PERSISTENT_SETUP is set") >>>> this commit introduces issue B, it is also not suitable to associate >>>> protocol state with state of lower level transport type such as serdev >>>> or uart, in my opinion, protocol state should be independent with >>>> transport type state, flag HCI_UART_PROTO_READY is for protocol state, >>>> it means if protocol hu->proto is initialized and if we can invoke its >>>> interfaces.it is common for various kinds of transport types. perhaps, >>>> this is the reason why Zijun's change doesn't use flag HCI_UART_PROTO_READY. >>> >>> Don't really follow you here, if HCI_UART_PROTO_READY indicates the >>> protocol state they is even _more_ important to use before invoking >>> serdev APIs, so checking for the quirk sound like a problem because: >>> >>> [1] hci_uart_close >>> /* When QUIRK HCI_QUIRK_NON_PERSISTENT_SETUP is set by driver, >>> * BT SOC is completely powered OFF during BT OFF, holding port >>> * open may drain the battery. >>> */ >>> if (test_bit(HCI_QUIRK_NON_PERSISTENT_SETUP, &hdev->quirks)) { >>> clear_bit(HCI_UART_PROTO_READY, &hu->flags); >>> serdev_device_close(hu->serdev); >>> } >>> >>> [2] hci_uart_unregister_device >>> if (test_bit(HCI_UART_PROTO_READY, &hu->flags)) { >>> clear_bit(HCI_UART_PROTO_READY, &hu->flags); >>> serdev_device_close(hu->serdev); >>> } >>> both case 1 and case 2 were introduced by Commit B in question which >> uses protocol state flag HCI_UART_PROTO_READY to track lower level >> transport type state, i don't think it is perfect. >> >> for common files hci_serdev.c and hci_ldisc.c, as you saw, the purpose >> of checking HCI_UART_PROTO_READY is to call protocol relevant >> interfaces, moreover, these protocol relevant interfaces do not deal >> with lower transport state. you maybe even notice below present function >> within which lower level serdev is flushed before HCI_UART_PROTO_READY >> is checked: >> >> static int hci_uart_flush(struct hci_dev *hdev) >> { >> ...... >> /* Flush any pending characters in the driver and discipline. */ >> serdev_device_write_flush(hu->serdev); >> >> if (test_bit(HCI_UART_PROTO_READY, &hu->flags)) >> hu->proto->flush(hu); >> >> return 0; >> } >> >> in my opinion, that is why qca_serdev_shutdown() does not check >> HCI_UART_PROTO_READY for later lower level serdev operations. >>> So only in case 1 checking the quirk is equivalent to >>> HCI_UART_PROTO_READY on case 2 it does actually check the quirk and >>> will proceed to call serdev_device_close, now perhaps the code is >>> assuming that shutdown won't be called after that, but it looks it >>> does since: >>> >> qca_serdev_shutdown() will never be called after case 2 as explained >> in the end. >>> static void serdev_drv_remove(struct device *dev) >>> { >>> const struct serdev_device_driver *sdrv = >>> to_serdev_device_driver(dev->driver); >>> if (sdrv->remove) >>> sdrv->remove(to_serdev_device(dev)); >>> >>> dev_pm_domain_detach(dev, true); >>> } >>> >>> dev_pm_domain_detach says it will power off so I assume that means >>> that shutdown will be called _after_ remove, so not I'm not really >>> convinced that we can avoid using HCI_UART_PROTO_READY, in fact the >>> following sequence might always be triggering: >>> >> dev_pm_domain_detach() should be irrelevant with qca_serdev_shutdown(), >> should not trigger call of qca_serdev_shutdown() as explained in the end >>> serdev_drv_remove -> qca_serdev_remove -> hci_uart_unregister_device >>> -> serdev_device_close -> qca_close -> kfree(qca) >>> dev_pm_domain_detach -> ??? -> qca_serdev_shutdown >>> >>> If this sequence is correct then qca_serdev_shutdown accessing >>> qca_data will always result in a UAF problem. >>> >> above sequence should not correct as explained below. >> >> serdev and its driver should also follow below generic device and driver >> design. >> >> 1) >> driver->shutdown() will be called during shut-down time at this time >> driver->remove() should not have been called. >> >> 2) >> driver->shutdown() is impossible to be called once driver->remove() >> was called. >> >> 3) for serdev, driver->remove() does not trigger call of >> driver->shutdown() since PM relevant poweroff is irrelevant with >> driver->shutdown() and i also don't find any PM relevant interfaces will >> call driver->shutdown(). >> >> i would like to explain issue B based on comments Zijun posted by public >> as below: >> >> issue B actually happens during reboot and let me look at these steps >> boot -> enable BT -> disable BT -> reboot. >> >> 1) step boot will call driver->probe() to register hdev and the serdev >> is opened after boot. >> >> 2) step enable will call hdev->open() and the serdev will still in open >> state >> >> 3) step disable will call hdev->close() and the serdev will be closed >> after hdev->close() for machine with config which results in >> HCI_QUIRK_NON_PERSISTENT_SETUP is set. >> >> 4) step reboot will call qca_serdev_shutdown() which will flush and >> write the serdev which are closed by above step disable, so cause the >> UAF issue, namely, kernel crash issue. >> >> so this issue is caused by commit B which close the serdev during >> hdev->close(). >> >> driver->remove() even is not triggered during above steps. >>>> Commit C: 272970be3dab ("Bluetooth: hci_qca: Fix driver shutdown on >>>> closed serdev") >>>> this commit is to fix issue B which is actually caused by Commit B, but >>>> it has Fixes tag for Commit A. and it also introduces the regression >>>> issue A. >>>> >>> >>> > > Reading again the commit message for the UAF fix it sounds like a > different problem: > no, the UAF issue commit C fixes should be the same issue descripted by me previously as explained below: the UAF issue happened with machine "qualcomm Technologies, Inc. Robotics RB5 (DT)", the machine uses qca6390 and have property enable-gpios configured, which will results in that quirk HCI_QUIRK_NON_PERSISTENT_SETUP is set, so must meet the UAF issue for normal operation sequences "boot -> enable BT -> disable BT -> reboot". Actually, only machines which uses QCA6390 and have property enable-gpios configured will meet the UAF issue as commented by Zijun with below link https://lore.kernel.org/linux-bluetooth/9ac11453-b7cf-43f3-8e46-f96e41ef190d@xxxxxxxxxxx/ > The driver shutdown callback (which sends EDL_SOC_RESET to the device > over serdev) should not be invoked when HCI device is not open (e.g. if > hci_dev_open_sync() failed), because the serdev and its TTY are not open > either. Also skip this step if device is powered off > (qca_power_shutdown()). > > So if hci_dev_open_sync has failed it says serdev and its TTY will not > be open either, so I guess that's why HCI_SETUP was added as a > condition to bail out? So it seems correct to do that although I'd > change the comments. > i believe hci_dev_open_sync failure should not really happens with the machine Robotics RB5, the purpose that it is mentioned with commit message is to illustrate that the serdev in closed state is operated and causes the UAF issue. let us assume that hci_dev_open_sync failure -> serdev is not opened -> UAF issue happens within qca_serdev_shutdown(), then BT will not be working at all and the commit C is actually a workaroud instead of a fix since the right approach is to solve the hci_dev_open_sync failure which happens firstly. Frankly, only checking quirk HCI_QUIRK_NON_PERSISTENT_SETUP is enough to fix the UAF issue caused by either "normal operation sequences" or "hci_dev_open_sync failure". > @Krzysztof Kozlowski do you still have a test setup for 272970be3dab > ("Bluetooth: hci_qca: Fix driver shutdown on closed serdev"), can you > try with these changes? >