Re: path to landing patch to fix warm boot issue for qca6390

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Paul,

On Fri, May 10, 2024 at 6:57 PM Paul Menzel <pmenzel@xxxxxxxxxxxxx> wrote:
>
> Dear Luiz,
>
>
> Am 10.05.24 um 23:25 schrieb Luiz Augusto von Dentz:
>
> > On Fri, May 10, 2024 at 4:54 PM Wren Turkal <wt@xxxxxxxxxxxxxxxx> wrote:
> >>
> >> On 5/10/24 12:48 PM, Luiz Augusto von Dentz wrote:
>
> >>> On Fri, May 10, 2024 at 3:14 PM Wren Turkal <wt@xxxxxxxxxxxxxxxx> wrote:
> >>>>
> >>>> On 5/6/24 12:49 PM, Luiz Augusto von Dentz wrote:
>
> >>>>> On Mon, May 6, 2024 at 3:24 PM Wren Turkal <wt@xxxxxxxxxxxxxxxx> wrote:
> >>>>>>
> >>>>>> Krzysztof,
> >>>>>>
> >>>>>> I am reaching out to you as you had the most important objections to the
> >>>>>> change to fix qca6390 for the warm boot/module reload bug that I am
> >>>>>> experiencing.
> >>>>>>
> >>>>>> For context, the problem is that the hci_uart module will send specific
> >>>>>> vendor specfic commands during shutdown of the hardware under most
> >>>>>> situations. These VSCs put the bluetooth device into a non-working state
> >>>>>> on my Dell XPS 13 9310 with qca6390 bluetooth hardware.
> >>>>>>
> >>>>>> Zijun's proposed fix is to not send these commands when it's not
> >>>>>> appropriate for the hardware. The vendor commands should be avoided when
> >>>>>> the hardware does not have persistent configuration or when the device
> >>>>>> is in setup state (indicating that is has never been setup and should
> >>>>>> not be sent the VSCs on the shutdown path). This is what Zijun's patch
> >>>>>> implements.
> >>>>>>
> >>>>>> In addition, Zijun's change removes the influence of both
> >>>>>> the QCA_BT_OFF qca flag and and HCI_RUNNING hdev flag. Zijun asserts
> >>>>>> that those flags should not influence the sending of the VSCs in the
> >>>>>> shutdown path. If I understand KK's objections properly, this is where
> >>>>>> his objection is stemming from. KK, is this correct?
> >>>>>>
> >>>>>> Zijun's proposed fix can be seen here:
> >>>>>> https://patchwork.kernel.org/project/bluetooth/patch/1713932807-19619-3-git-send-email-quic_zijuhu@xxxxxxxxxxx/
> >>>>>>
> >>>>>> I'm wondering if we can resolve this impasse by splitting the change
> >>>>>> into two changes, as follows:
> >>>>>>
> >>>>>> 1. Change that removes the influence of the QCA_BT_OFF and HCI_RUNNING
> >>>>>> flags in the shutdown path.
> >>>>>> 2. Add the quirk from Zijun's patch that fixes my hardward configuration.
> >>>>>>
> >>>>>> I'm hoping that better clearer descriptions for #1 can help get that
> >>>>>> landed since the logic current appears to be at odds with how the
> >>>>>> hardware works.
> >>>>>>
> >>>>>> Also, I am happy to split the patches into the two patches, or (maybe
> >>>>>> more ideally) just modify the commit message to better indicate the
> >>>>>> reason the change. I just need guidance from maintainers so that
> >>>>>> whatever work I do leads to something acceptable for y'all.
> >>>>>>
> >>>>>> So, please help me get this done. I am just a user with broken hardware
> >>>>>> and a fondness for Linux. I would love to help do what's needed to get
> >>>>>> this fix landed.
> >>>>>
> >>>>> Ive also objected to that change, in fact the whole shutdown sequence
> >>>>> is sort of bogus in my opinion and the driver shall really have some
> >>>>> means to find out what mode it is in when it reboots, regardless if
> >>>>> cold or warm boot, since otherwise we are in trouble if the user is
> >>>>> booting from another OS that doesn't do the expected shutdown
> >>>>> sequence.
> >>>>
> >>>> This criticism makes a ton of sense. I'm sorry I missed it before. There
> >>>> were a lot of threads moving in parallel. However, I am curious. Given
> >>>> that the patch improves the situation for users (like me). Is there any
> >>>> way we can separate the redesign of the shutdown sequence and the UX
> >>>> improvement that comes with this patch?
> >>>>
> >>>> Here's my concern. I am happy to do the work to redesign this. However,
> >>>> I don't think I have the information needed to do this since I don't
> >>>> have access to the technical docs for the qca6390. I am worried that not
> >>>> accepting some form of this patch is letting perfect be the enemy of the
> >>>> good. And I am not sure how I personally can help with that. If you
> >>>> think it's possible for me to do this without the docs for the hardware,
> >>>> I am willing to give it a shot if I can get some guidance. Honestly, I
> >>>> wish I had the skill to be confident about a change like this, but I don't.
> >>>>
> >>>> Any ideas on how to move forward would be greatly appreciated.
> >>>>
> >>>> And just to be perfectly clear, I have tested this patch on my laptop.
> >>>> It greatly enhances my ability to use my hardware since I can reboot the
> >>>> machine without having to make sure to power cycle the laptop. This is
> >>>> not a theoretical improvement.
> >>>
> >>> I would really love some explanation why can't the driver know under
> >>> what mode the controller is when it gets probed, because to me we
> >>> cannot accept a driver that only works under certain condition after
> >>> the boot and in case it is really impossible, can't even power cycle
> >>> it to get it back to cold boot stage???
> >>
> >> This is a great technical criticism of the driver, and I think you
> >> deserve that explanation.
> >>
> >> However, with the driver already in the kernel, shouldn't the bias be
> >> toward mitigating the extremely bad UX and not hold users hostage for
> >> the bad design which has already been approved and landed in the kernel?
> >>
> >>> Also the criticism here should be directed to the vendor, how long
> >>> have we been discussing problems in the QCA driver? And the only thing
> >>> I see coming our way are work-arounds of the problems, the address not
> >>> being unique coming from the firmware itself and when provided via DT
> >>> the address is in the wrong byteorder and now that the driver must
> >>> communicate the firmware on shutdown in order to get it working
> >>> properly on the next boot.
> >>
> >> I agree that Qualcomm should get flack for this, however, the UX problem
> >> can be mitigated with a logic fix that doesn't make the init/shutdown
> >> design problem any worse than it currently seems to be. I mean, wouldn't
> >> this logic have to exist somewhere even if it weren't the shutdown path?
> >>
> >> If you are trying to use this as leverage to get Qualcomm to do a bigger
> >> thing (redesign the init/shutdown logic), I do think that tactic
> >> needlessly puts users in the crossfire. I can totally understand why
> >> you'd do it. I am just suffering the crossfire in the meantime, and it
> >> doesn't feel great.
> >
> > So you prefer to risk getting a kernel crash due to UAF over Bluetooth
> > not working? Really? Because I haven't seen any configuration that
> > those changes you tested don't reintroduce the UAF, which is why I
> > haven't applied that change to begin with, so no I'm not holding back
> > to pressure Qualcomm to redesign the shutdown logic, it just these
> > things got entangled because I just realized the shutdown thingy is
> > really out of place, imo, but I'd be fine if there is a temporary fix
> > until someone finally decide to spend some time to really fix the
> > shutdown logic.
>
> The no-regression policy is clear to not cause regressions for users.
> Even if we had crash reports from before, fixing one bug and causing
> another is unwanted to my understanding.

So you are also suggesting that a system crash is less of a problem
then a regression? I'm not opposed to having a fix to the regression,
but it needs to come without the reintroduction of a system crash,
period. The no-regression policy only applies as long as it doesn't
cause even bigger problems like a system crash or a security problem
for example, otherwise we need to find a better option, which in my
opinion is to rework the driver shutdown logic, if I had the hardware
and could experiment with I would just attempt to detect the hardware
mode at probe and if that fails force a reset and start over since the
setup stage exists exactly for the driver to have exclusive access to
the controller it can in theory power cycle the controller.

>
> Kind regards,
>
> Paul



-- 
Luiz Augusto von Dentz





[Index of Archives]     [Bluez Devel]     [Linux Wireless Networking]     [Linux Wireless Personal Area Networking]     [Linux ATH6KL]     [Linux USB Devel]     [Linux Media Drivers]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Big List of Linux Books]

  Powered by Linux