> On 20 Dec 2017, at 7:11 AM, Brian Norris <briannorris@xxxxxxxxxxxx> wrote: > > Hi Kai, > > On Tue, Dec 19, 2017 at 12:28:17PM +0800, Kai Heng Feng wrote: >>> On 19 Dec 2017, at 2:13 AM, Brian Norris <briannorris@xxxxxxxxxxxx> wrote: >>> On Mon, Dec 18, 2017 at 12:43:48PM +0100, Greg Kroah-Hartman wrote: >>>> On Fri, Dec 15, 2017 at 07:05:39PM -0800, Matthias Kaehlcke wrote: >>>>> We identified the above patch as the culprit, in combination with USB >>>>> autosuspend being enabled for the Bluetooth controller. >>>>> >>>>> We found that the following (recent) upstream patch fixes the issue on >>>>> most devices (we are still investigating one case on a specific device): >>> >>> A key word to highlight here is "most". I have at least one device that >>> is consistently having problems even with both patches included; the >>> only way things work reliably so far is to simply revert the $subject >>> patch in 4.4.x. >> >> The problem we have is that the QCA Rome Bluetooth stops working >> after system S3. The reset resume quirk workaround the issue on both >> mainline and 4.4.x (at least for me). > > Understood. But that's not the case for all systems, and on some, you're > regressing functionality. Yes. So reverting the commit is a reasonable path we should take. > > Many systems keep power enabled for system suspend, and so that "fix" is > not necessary for them. It's also not very precise, because it seems to > act in many cases where it need not -- for one, it takes effect on *all* > resume attempts, even resuming from autosuspend. I really doubt your > system is losing USB power in S0 + USB autosuspend? No, USB power is on. Not seeing my XHCI gets put to D3cold. > > So, if you really need this patch for some systems, it seems like it > should be much better targeted. You're currently adding a lot of > unnecessary overhead and fragility. You are right. Applying this unconditionally to all QCA Rome may be overkill. > >>> I'll try to investigate further, since this result is a bit confusing >>> still. If there's a real problem with the patch (and I suspect there >>> might be), it probably shouldn't be in mainline either… >> >> Do you have the same issue on mainline? > > Refer to my note [1] :) > > But because you asked, I did the work necessary to boot mainline on this > system, and in fact, I see the same problem. I think that's enough > reason to revert your patch in all branches (upstream, and any -stable > branch that already took it). > > To be clear, here's the kernel logs on 4.15-rc4+, where the 55-second > mark is around where I try to power on and scan for BT devices (power-on > and scan both fail): > > [ 2.323475] usb 3-1: new full-speed USB device number 2 using ohci-platform > ... > [ 2.731719] usb 3-1: New USB device found, idVendor=0cf3, idProduct=e300 > ... > [ 2.867870] usb 3-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0 > ...(udev doesn't run until here)... > [ 35.733139] usbcore: registered new interface driver btusb > [ 35.740912] Bluetooth: hci0: using rampatch file: qca/rampatch_usb_00000302.bin > [ 35.749216] Bluetooth: hci0: QCA: patch rome 0x302 build 0x3e8, firmware rome 0x302 build 0x111 > ... > [ 35.816104] Bluetooth: hci0: using NVM file: qca/nvm_usb_00000302.bin > ... > [ 55.073503] usb 3-1: reset full-speed USB device number 2 using ohci-platform > ... > [ 57.772513] Bluetooth: hci0: urb 00000000ec39086b failed to resubmit (113) > [ 57.780217] Bluetooth: hci0: urb 0000000066228fda failed to resubmit (113) > > Whereas reverting the $subject patch yields no reset and URB failure; > Bluetooth seems to work fine. Thanks for your testing on mainline! > >>>>> commit a0085f2510e8976614ad8f766b209448b385492f >>>>> Author: Sukumar Ghorai <sukumar.ghorai@xxxxxxxxx> >>>>> Date: Wed Aug 16 14:46:55 2017 -0700 >>>>> >>>>> Bluetooth: btusb: driver to enable the usb-wakeup feature >>> [...] >>>>> stable branches are currently broken for BTUSB_QCA_ROME with USB >>>>> autosuspend enabled, since the above patch is not included (I only >>>>> checked v4.4 and v4.9), so we probably want to integrate it. >>>> >>>> Now queued up, thanks for letting me know. >>> >>> I think you have almost as much information as I do at this point, and >>> I'll try to reply back here if I figure out anything more, so I'll leave >>> you to your decisions. But I would personally revert, not backport more >>> patches. >>> >>> Among the reasons: >>> (a) I have at least one device that does not work better with both >>> patches [1] >>> (b) So far, I haven't seen any explanation why commit a0085f2510e8 >>> should be a dependency for $subject patch; the closest I can imagine >>> would be that commit a0085f2510e8 could effectively neutralize >>> $subject patch -- if the device is marked as a wakeup source, then >>> it will never try to RESET on resume -- but that's still a fuzzy >>> signal; just because it's marked a wakeup source sometimes doesn't >>> mean it always will be. We can disable it in user space too. >> >> Hi Leif, >> >> Can you try if a0085f2510e8 breaks your $subject commit? >> If it’s a yes, then what Brian suggests is correct. >> >> Also, can you share the output of "usb-device” for your btusb device? > > FWIW, here's mine: > > T: Bus=03 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 2 Spd=12 MxCh= 0 > D: Ver= 2.01 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs= 1 > P: Vendor=0cf3 ProdID=e300 Rev=00.01 > C: #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=100mA > I: If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb > I: If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb > > This is an integrated Wifi+BT QCA6174A-5 M.2 module. Thanks for the info. I found the same BT on a Killer Wireless, I’ll do some testing on it. > >>> (c) Isn't it safer to revert than to backport more? You're delving into >>> feature-land, not bugfix-land… >> >> Sounds reasonable. > > Shall I post a proper revert for mainline, with Fixes tags and CC > stable? Or did one of you want to do that? Let me clean my own mess ;) > >> We’ll need more information from Leif if we need to do ID-specific quirks. > > As I see it, this probably isn't just an ID-specific thing; this patch > looks bad for its intended purpose, per my comments above (it's > overkill, for one). I think there are many factors involved here, and it > needs more study than "I just booted, suspended/resumed once, and it > worked better." > > Among the things to consider: > * How do we determine the likelihood that the device gets powered off in > S3? The device_may_wakeup() check can be influenced (among other > things) by user space policy, so that doesn't really look right to me. > (ID-based checks might help a little, since you can probably > differentiate chips that are likely used on discrete/removable USB > parts, rather than internal chips?) Actually, I think we can’t know for sure. From my own experience, it’s pretty ODM specific. Some external chips keep its power under S3. > * The ordering: right now, you're enabling "reset on resume" at probe > time, but QCA ROME devices only load their firmware in btusb_open(). > This gives a window of time in which you're adding weird reset > behavior while the device may be running its ROM firmware, not the > patched firmware loaded from /lib/firmware/. That might be one reason > things don't work out so well? You are right, the reset on resume should be used when USB probing, instead of in the middle of btusb_open(). > I also suspect that might be one reason that Matthias sometimes saw > the problem disappear on his devices, where I didn't. I have my device > configured to cold reset the BT device on every reboot, whereas his > isn't; it might retain the FW in the BT device's RAM across reboot. > So my behavior was more deterministic :) > * The behavior here is affected by USB autosuspend, not just S3. I > mentioned this above, but more specifically, ChromeOS has udev rules > like this, to enable autosuspend on various sorts of devices > (including bluetooth controllers + BT HID devices): > > <udev rule snippet> > ATTR{idVendor}=="0cf3", ATTR{idProduct}=="e300", GOTO="autosuspend_enable" > ... > # Enable autosuspend > LABEL="autosuspend_enable" > TEST=="power/control", ATTR{power/control}="auto", GOTO="autosuspend_end" > > LABEL="autosuspend_end" > </udev rule snippet> > > I suspect some desktop systems might not have the same policies, so > you're not testing autosuspend in the same way I am? Actually it’s same from kernel’s perspective. I use TLP, it does the same thing to the power/control knob. Kai-Heng > > Brian > >> Kai-Heng >> >>> Brian >>> >>> [1] Before you ask: this particular device is not quite fully supported >>> on upstream yet, though its sibling devices are. With a bit of effort, I >>> might be able to test a clean upstream. At the moment, I'm using our >>> Chrom{e,ium}OS 4.4 kernel, where we regularly merge in -stable updates.