> Blacklisting the i2c-designware driver should do the trick. OK, thanks! On Tue, May 22, 2018 at 12:29 AM, Hans de Goede <hdegoede@xxxxxxxxxx> wrote: > Hi, > > On 22-05-18 01:58, wereturtle wrote: >> >> By the way, >> >>> If this indeed is an Intel system one thing worth trying is >>> changing the clk_freq we use in the i2c-designware driver to >>> calculate clk high and low counts. There are some reports from >>> people attaching scopes to the i2c wires that on kabylake at >>> least the driver is driving the bus about 1.5 times the >>> expected rate (so 600KHz instead of 400KHz). >> >> >> If I theoretically ran into the into the issue with the i2c interrupts >> driving my CPU cores up to 100% once again in the future, what could I >> do to work around it? Would blacklisting the i2c-designware driver >> fix that? Would that have negative side effects for other things? Is >> there a kernel parameter to make the gpio i2c not send so many >> interrupts? > > > Blacklisting the i2c-designware driver should do the trick. > > Regards, > > Hans > > > > >> >> I just want to make sure a kernel update in the future does not brick >> the laptop, as by the time the patch officially arrives in my distro I >> would be well outside my return window with Best Buy. (I'm not >> discounting Jarkko's ability to make a smooth patch, but this is for a >> doomsday scenario.) >> >> Thanks! >> >> On Mon, May 21, 2018 at 12:56 PM, wereturtle <wereturtledev@xxxxxxxxx> >> wrote: >>>> >>>> I would expect 4.18 given that 4.17 is more or less a done deal, but >>>> once the patch is out I expect it to also be cherry-picked as a bug-fix >>>> to the stable releases of older kernels. >>> >>> >>> Sounds great! Thanks, Hans! And thank you, to all of you! >>> >>> On Mon, May 21, 2018 at 11:12 AM, Hans de Goede <hdegoede@xxxxxxxxxx> >>> wrote: >>>> >>>> Hi, >>>> >>>> >>>> On 21-05-18 19:02, wereturtle wrote: >>>>> >>>>> >>>>> Hi Hans, >>>>> >>>>>> Have you tried turning off the computer and removing the battery, >>>>>> then wait 5 minutes and put the battery back again? >>>>> >>>>> >>>>> >>>>> OK, I had to run to the store to get the right size Torx screwdriver, >>>>> but I removed the battery and waited 15 minutes before putting it back >>>>> in. Unfortunately, that didn't help. >>>>> >>>>>> Likely having the touchpad properly working results in either the >>>>>> touchpad >>>>>> or the i2c controller firing an interrupt at boot because of state >>>>>> left >>>>>> over from the previous boot with working touchpad. Since you now lack >>>>>> a working driver, nothing acks the interrupt and it keeps firing. >>>>> >>>>> >>>>> >>>>> Interesting! Thanks for teaching me! >>>>> >>>>>> The proper solution here would be to build 4.15 with the fix, or >>>>>> see if there are some patches to the nvidia driver to make it build >>>>>> with 4.17, which avoids the need for another kernel build. >>>>> >>>>> >>>>> >>>>> Alas, I've had too much difficulty with the latter solutions. 4.17 >>>>> vanilla (no patch) seems to ack it already. 4.15 would need something >>>>> else to ack it in the first place, and I'm not sure where to patch >>>>> that. I couldn't find any Nvidia patches for 4.17. >>>>> >>>>> Fortunately, Best Buy let me return the laptop without any fuss. They >>>>> were even offering to exchange it. I declined, but I might try again >>>>> and live without the touchpad for a time since the laptop just went on >>>>> sale this morning. >>>>> >>>>> As such, when, roughly, do you think the official patch will land? >>>>> Kernel 4.17 or 4.18? (Supposedly Ubuntu 18.10 will have 4.18, if the >>>>> stars align.) >>>> >>>> >>>> >>>> I would expect 4.18 given that 4.17 is more or less a done deal, but >>>> once the patch is out I expect it to also be cherry-picked as a bug-fix >>>> to the stable releases of older kernels. >>>> >>>> Jarkko, I did not see an official patch for this yet, I'm not on the >>>> list though, so I don't know if was not posted at all, or if you did >>>> not Cc me? (not Cc-ing me is fine I'm only sideways involved). >>>> >>>> Regards, >>>> >>>> Hans >>>> >>>> >>>> >>>> >>>> >>>>> >>>>> Thanks! >>>>> >>>>> On Sun, May 20, 2018 at 3:13 AM, Hans de Goede <hdegoede@xxxxxxxxxx> >>>>> wrote: >>>>>> >>>>>> >>>>>> Hi, >>>>>> >>>>>> >>>>>> On 20-05-18 04:34, wereturtle wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> Some bad news to follow up the good news. >>>>>>> >>>>>>> Installing the patched Kernel for my touchpad had a negative side >>>>>>> effect. While running the patched Kernel, I didn't have any issues. >>>>>>> However, I couldn't get the Nvidia driver to install with this >>>>>>> Kernel. >>>>>>> As such, I tried rebooting into my old 4.15 Kernel. Even after >>>>>>> removing the patched Kernel and reinstalling the Nvidia driver >>>>>>> several >>>>>>> times for 4.15, my computer became sluggish during browsing, typing, >>>>>>> etc. Games were locking up or else having a huge framerate drop. My >>>>>>> CPU cores were spinning like crazy without even any processes taking >>>>>>> up CPU. >>>>>>> >>>>>>> Further investigation revealed a an "unexpected IRQ trap at vector >>>>>>> 9a" >>>>>>> error message at startup and shutdown in the console. The message >>>>>>> fires constantly. Under /proc/interrupts, it was listing intel-gpio >>>>>>> (i2c) for 9a. It was firing off like crazy. I think that's for my >>>>>>> touchpad? >>>>>>> >>>>>>> I tried reinstalling Kubuntu altogether, twice, and it wouldn't stop. >>>>>>> It's like that patch permanently wrote something to my hardware? I >>>>>>> tried installing 4.17 RC5 with Ukuu without the touchpad patch built >>>>>>> in. The intel-gpio interrupts went away, and the computer is snappy >>>>>>> and responsive again. However, rebooting back into 4.15 resulted in >>>>>>> the interrupts returning. >>>>>>> >>>>>>> How did this patch end up doing something permanently to my computer, >>>>>>> and what can I do to undo it for Kernel 4.15 so that I can use my >>>>>>> Nvidia drivers again? >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Have you tried turning off the computer and removing the battery, >>>>>> then wait 5 minutes and put the battery back again? >>>>>> >>>>>> Likely having the touchpad properly working results in either the >>>>>> touchpad >>>>>> or the i2c controller firing an interrupt at boot because of state >>>>>> left >>>>>> over from the previous boot with working touchpad. Since you now lack >>>>>> a working driver, nothing acks the interrupt and it keeps firing. >>>>>> >>>>>> The proper solution here would be to build 4.15 with the fix, or >>>>>> see if there are some patches to the nvidia driver to make it build >>>>>> with 4.17, which avoids the need for another kernel build. >>>>>> >>>>>> Regards, >>>>>> >>>>>> Hans >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> Also, I don't notice this same sluggishness with Windows 10. Windows >>>>>>> is snappy regardless of the Kernel. >>>>>>> >>>>>>> On Sat, May 19, 2018 at 12:42 PM, wereturtle >>>>>>> <wereturtledev@xxxxxxxxx> >>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Hi everyone! >>>>>>>> >>>>>>>> I set the clk_rate to be 216000000 in my own patched Kernel 4.17. RC >>>>>>>> 5, and my touchpad now works! >>>>>>>> >>>>>>>> Thank you so much! >>>>>>>> >>>>>>>> On Fri, May 18, 2018 at 12:39 AM, Hans de Goede >>>>>>>> <hdegoede@xxxxxxxxxx> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> >>>>>>>>> On 18-05-18 09:32, Hans de Goede wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> On 17-05-18 20:14, Dmitry Torokhov wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, May 17, 2018 at 2:36 AM, Benjamin Tissoires >>>>>>>>>>> <benjamin.tissoires@xxxxxxxxx> wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Scope (_SB.PCI0.I2C1) >>>>>>>>>>>> { >>>>>>>>>>>> Device (ETPD) >>>>>>>>>>>> { >>>>>>>>>>>> Name (SBFB, ResourceTemplate () >>>>>>>>>>>> { >>>>>>>>>>>> I2cSerialBusV2 (0x004C, ControllerInitiated, >>>>>>>>>>>> 0x00061A80, >>>>>>>>>>>> AddressingMode7Bit, "\\_SB.PCI0.I2C1", >>>>>>>>>>>> 0x00, ResourceConsumer, _Y34, Exclusive, >>>>>>>>>>>> ) >>>>>>>>>>>> }) >>>>>>>>>>>> Name (SBFI, ResourceTemplate () >>>>>>>>>>>> { >>>>>>>>>>>> Interrupt (ResourceConsumer, Level, >>>>>>>>>>>> ActiveHigh, >>>>>>>>>>>> Exclusive, ,, ) >>>>>>>>>>>> { >>>>>>>>>>>> 0x0000005F, >>>>>>>>>>>> } >>>>>>>>>>>> }) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> ... >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> So nothing scary, the interrupt is a plain interrupt, not a >>>>>>>>>>>> GPIO. I >>>>>>>>>>>> guess the issue lies in i2c-designware and the AMD >>>>>>>>>>>> implementation... >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Also, in dmesg we have: >>>>>>>>>>> >>>>>>>>>>> [ 25.020612] cannonlake-pinctrl INT3450:00: pin 26 cannot be >>>>>>>>>>> used >>>>>>>>>>> as >>>>>>>>>>> IRQ >>>>>>>>>>> [ 25.020615] genirq: Setting trigger mode 3 for irq 137 failed >>>>>>>>>>> (intel_gpio_irq_type+0x0/0x140) >>>>>>>>>>> [ 25.023113] intel-lpss 0000:00:15.1: enabling device (0000 -> >>>>>>>>>>> 0002) >>>>>>>>>>> [ 25.023336] idma64 idma64.1: Found Intel integrated DMA 64-bit >>>>>>>>>>> [ 25.025326] i2c_hid i2c-ELAN1201:00: i2c-ELAN1201:00 supply >>>>>>>>>>> vdd >>>>>>>>>>> not >>>>>>>>>>> found, using dummy regulator >>>>>>>>>>> [ 25.025494] i2c_designware i2c_designware.1: >>>>>>>>>>> i2c_dw_handle_tx_abort: lost arbitration >>>>>>>>>>> [ 25.025652] i2c_designware i2c_designware.1: >>>>>>>>>>> i2c_dw_handle_tx_abort: lost arbitration >>>>>>>>>>> [ 25.025811] i2c_designware i2c_designware.1: >>>>>>>>>>> i2c_dw_handle_tx_abort: lost arbitration >>>>>>>>>>> [ 25.025970] i2c_designware i2c_designware.1: >>>>>>>>>>> i2c_dw_handle_tx_abort: lost arbitration >>>>>>>>>>> [ 25.025972] i2c_hid i2c-ELAN1201:00: hid_descr_cmd failed >>>>>>>>>>> >>>>>>>>>>> 0x5F is kind of high for a plain interrupt; I wonder if ACPI >>>>>>>>>>> table >>>>>>>>>>> relies on static gpio->virq mapping that could be different on >>>>>>>>>>> Linux... Also I am surprised the IRQ is active-HIGH, normally it >>>>>>>>>>> is >>>>>>>>>>> active low. Might want to try and hack the driver to force it to >>>>>>>>>>> low >>>>>>>>>>> and see what happens... >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Yes the interrupt is definitely suspect. Actually using plain >>>>>>>>>> interrupts >>>>>>>>>> rather then a GpioInt is something which I would only expect to >>>>>>>>>> see >>>>>>>>>> in >>>>>>>>>> old DSDTs and not in recent ones, because for i2c devices there is >>>>>>>>>> no clear parent interrupt controller and as such no well defined >>>>>>>>>> way >>>>>>>>>> to >>>>>>>>>> properly interpret a raw Interrupt number. >>>>>>>>>> >>>>>>>>>> What is with the AMD reference btw, the above dmesg snippet looks >>>>>>>>>> to be about an Intel system? I would not expect cannonlake-pinctrl >>>>>>>>>> to be used on an AMD system... >>>>>>>>>> >>>>>>>>>> If this indeed is an Intel system one thing worth trying is >>>>>>>>>> changing the clk_freq we use in the i2c-designware driver to >>>>>>>>>> calculate clk high and low counts. There are some reports from >>>>>>>>>> people attaching scopes to the i2c wires that on kabylake at >>>>>>>>>> least the driver is driving the bus about 1.5 times the >>>>>>>>>> expected rate (so 600KHz instead of 400KHz). >>>>>>>>>> >>>>>>>>>> A workaround for now would be to edit: >>>>>>>>>> >>>>>>>>>> drivers/mfd/intel-lpss-pci.c >>>>>>>>>> >>>>>>>>>> And change clk_rate in: >>>>>>>>>> >>>>>>>>>> static const struct intel_lpss_platform_info spt_i2c_info = { >>>>>>>>>> .clk_rate = 120000000, >>>>>>>>>> .properties = spt_i2c_properties, >>>>>>>>>> }; >>>>>>>>>> >>>>>>>>>> From 120000000 to 180000000, people are still working on >>>>>>>>>> getting >>>>>>>>>> to the bottom of this but it is worth a shot. The clk_rate >>>>>>>>>> value here is only used to calculate i2c timings and does >>>>>>>>>> not actually program a clock, it only specifies the frequency >>>>>>>>>> the clock is expected to be running at. So changing this should >>>>>>>>>> be safe. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Ok, so I just read the new mails in the threads where this is being >>>>>>>>> discussed and it has been confirmed by Intel that for all Canon >>>>>>>>> Lake >>>>>>>>> devices the correct clk_rate is 216000000 . Which likely explains >>>>>>>>> the i2c errors here. Jarkko (added to the Cc) is working on a patch >>>>>>>>> for this. >>>>>>>>> >>>>>>>>> For now if you can build your own kernels you can make the change I >>>>>>>>> suggested above, but that will also change the clock-rate on other >>>>>>>>> machines, so that is just for testing on Canon Lake hardware! >>>>>>>>> >>>>>>>>> The way the Interrupt is specified is still suspicious btw, but >>>>>>>>> we'll cross that bridge when we get there. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> >>>>>>>>> Hans