Re: path to landing patch to fix warm boot issue for qca6390

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/10/24 2:25 PM, Luiz Augusto von Dentz wrote:
Hi Wren,

On Fri, May 10, 2024 at 4:54 PM Wren Turkal <wt@xxxxxxxxxxxxxxxx> wrote:

On 5/10/24 12:48 PM, Luiz Augusto von Dentz wrote:
Hi Wren,

On Fri, May 10, 2024 at 3:14 PM Wren Turkal <wt@xxxxxxxxxxxxxxxx> wrote:

On 5/6/24 12:49 PM, Luiz Augusto von Dentz wrote:
Hi Wren,

On Mon, May 6, 2024 at 3:24 PM Wren Turkal <wt@xxxxxxxxxxxxxxxx> wrote:

Krzysztof,

I am reaching out to you as you had the most important objections to the
change to fix qca6390 for the warm boot/module reload bug that I am
experiencing.

For context, the problem is that the hci_uart module will send specific
vendor specfic commands during shutdown of the hardware under most
situations. These VSCs put the bluetooth device into a non-working state
on my Dell XPS 13 9310 with qca6390 bluetooth hardware.

Zijun's proposed fix is to not send these commands when it's not
appropriate for the hardware. The vendor commands should be avoided when
the hardware does not have persistent configuration or when the device
is in setup state (indicating that is has never been setup and should
not be sent the VSCs on the shutdown path). This is what Zijun's patch
implements.

In addition, Zijun's change removes the influence of both
the QCA_BT_OFF qca flag and and HCI_RUNNING hdev flag. Zijun asserts
that those flags should not influence the sending of the VSCs in the
shutdown path. If I understand KK's objections properly, this is where
his objection is stemming from. KK, is this correct?

Zijun's proposed fix can be seen here:
https://patchwork.kernel.org/project/bluetooth/patch/1713932807-19619-3-git-send-email-quic_zijuhu@xxxxxxxxxxx/

I'm wondering if we can resolve this impasse by splitting the change
into two changes, as follows:

1. Change that removes the influence of the QCA_BT_OFF and HCI_RUNNING
flags in the shutdown path.
2. Add the quirk from Zijun's patch that fixes my hardward configuration.

I'm hoping that better clearer descriptions for #1 can help get that
landed since the logic current appears to be at odds with how the
hardware works.

Also, I am happy to split the patches into the two patches, or (maybe
more ideally) just modify the commit message to better indicate the
reason the change. I just need guidance from maintainers so that
whatever work I do leads to something acceptable for y'all.

So, please help me get this done. I am just a user with broken hardware
and a fondness for Linux. I would love to help do what's needed to get
this fix landed.

Ive also objected to that change, in fact the whole shutdown sequence
is sort of bogus in my opinion and the driver shall really have some
means to find out what mode it is in when it reboots, regardless if
cold or warm boot, since otherwise we are in trouble if the user is
booting from another OS that doesn't do the expected shutdown
sequence.

This criticism makes a ton of sense. I'm sorry I missed it before. There
were a lot of threads moving in parallel. However, I am curious. Given
that the patch improves the situation for users (like me). Is there any
way we can separate the redesign of the shutdown sequence and the UX
improvement that comes with this patch?

Here's my concern. I am happy to do the work to redesign this. However,
I don't think I have the information needed to do this since I don't
have access to the technical docs for the qca6390. I am worried that not
accepting some form of this patch is letting perfect be the enemy of the
good. And I am not sure how I personally can help with that. If you
think it's possible for me to do this without the docs for the hardware,
I am willing to give it a shot if I can get some guidance. Honestly, I
wish I had the skill to be confident about a change like this, but I don't.

Any ideas on how to move forward would be greatly appreciated.

And just to be perfectly clear, I have tested this patch on my laptop.
It greatly enhances my ability to use my hardware since I can reboot the
machine without having to make sure to power cycle the laptop. This is
not a theoretical improvement.

I would really love some explanation why can't the driver know under
what mode the controller is when it gets probed, because to me we
cannot accept a driver that only works under certain condition after
the boot and in case it is really impossible, can't even power cycle
it to get it back to cold boot stage???

This is a great technical criticism of the driver, and I think you
deserve that explanation.

However, with the driver already in the kernel, shouldn't the bias be
toward mitigating the extremely bad UX and not hold users hostage for
the bad design which has already been approved and landed in the kernel?

Also the criticism here should be directed to the vendor, how long
have we been discussing problems in the QCA driver? And the only thing
I see coming our way are work-arounds of the problems, the address not
being unique coming from the firmware itself and when provided via DT
the address is in the wrong byteorder and now that the driver must
communicate the firmware on shutdown in order to get it working
properly on the next boot.

I agree that Qualcomm should get flack for this, however, the UX problem
can be mitigated with a logic fix that doesn't make the init/shutdown
design problem any worse than it currently seems to be. I mean, wouldn't
this logic have to exist somewhere even if it weren't the shutdown path?

If you are trying to use this as leverage to get Qualcomm to do a bigger
thing (redesign the init/shutdown logic), I do think that tactic
needlessly puts users in the crossfire. I can totally understand why
you'd do it. I am just suffering the crossfire in the meantime, and it
doesn't feel great.

So you prefer to risk getting a kernel crash due to UAF over Bluetooth
not working? Really? Because I haven't seen any configuration that
those changes you tested don't reintroduce the UAF, which is why I
haven't applied that change to begin with, so no I'm not holding back
to pressure Qualcomm to redesign the shutdown logic, it just these
things got entangled because I just realized the shutdown thingy is
really out of place, imo, but I'd be fine if there is a temporary fix
until someone finally decide to spend some time to really fix the
shutdown logic.

Luiz, I'm sorry. I do not want a crash instead. I didn't understand that the solution I proposed (i.e. adding Zijun's logic without removing KK's logic) would introduce a new crash opportunity. I previously thought you were saying one of the following things:
1. The crash opportunity already existed due the init/shudown sequences.
2. The crash opportunity already existed due the init/shudown sequences when removing KK's logic.

If it was #1, I was hoping that adding the logic would make the risk no worse.

If it was #2, I was hoping that my suggestion of adding Zijun's logic without removing KK's logic might represent an acceptable middleground for a temporary fix that would "correct" the logic without introducing a new crash opportunity.

I feel like I may not be clear about what I mean by adding Zijun's logic and not removing KK's logic. Maybe something like this diff:

diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
index 2f7ae38d85eb..fcac44ae7898 100644
--- a/drivers/bluetooth/hci_qca.c
+++ b/drivers/bluetooth/hci_qca.c
@@ -2456,6 +2456,10 @@ static void qca_serdev_shutdown(struct device *dev)
                     !test_bit(HCI_RUNNING, &hdev->flags))
                         return;

+               if (test_bit(HCI_QUIRK_NON_PERSISTENT_SETUP,
&hdev->quirks) ||
+                   hci_dev_test_flag(hdev, HCI_SETUP))
+                       return;
+
                 serdev_device_write_flush(serdev);
                 ret = serdev_device_write_buf(serdev, ibs_wake_cmd,
                                               sizeof(ibs_wake_cmd));

I think this diff is mangled due to using Thunderbird, but I hope this helps convey what I was asking about.

If I am understanding you correctly now, you are saying that simply introducing Zijun's logic (without removing KK's logic) will introduce a new crash opportunity. Is that correct?

Thanks,
wt
--
You're more amazing than you think!




[Index of Archives]     [Bluez Devel]     [Linux Wireless Networking]     [Linux Wireless Personal Area Networking]     [Linux ATH6KL]     [Linux USB Devel]     [Linux Media Drivers]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Big List of Linux Books]

  Powered by Linux