By the way, > If this indeed is an Intel system one thing worth trying is > changing the clk_freq we use in the i2c-designware driver to > calculate clk high and low counts. There are some reports from > people attaching scopes to the i2c wires that on kabylake at > least the driver is driving the bus about 1.5 times the > expected rate (so 600KHz instead of 400KHz). If I theoretically ran into the into the issue with the i2c interrupts driving my CPU cores up to 100% once again in the future, what could I do to work around it? Would blacklisting the i2c-designware driver fix that? Would that have negative side effects for other things? Is there a kernel parameter to make the gpio i2c not send so many interrupts? I just want to make sure a kernel update in the future does not brick the laptop, as by the time the patch officially arrives in my distro I would be well outside my return window with Best Buy. (I'm not discounting Jarkko's ability to make a smooth patch, but this is for a doomsday scenario.) Thanks! On Mon, May 21, 2018 at 12:56 PM, wereturtle <wereturtledev@xxxxxxxxx> wrote: >> I would expect 4.18 given that 4.17 is more or less a done deal, but >> once the patch is out I expect it to also be cherry-picked as a bug-fix >> to the stable releases of older kernels. > > Sounds great! Thanks, Hans! And thank you, to all of you! > > On Mon, May 21, 2018 at 11:12 AM, Hans de Goede <hdegoede@xxxxxxxxxx> wrote: >> Hi, >> >> >> On 21-05-18 19:02, wereturtle wrote: >>> >>> Hi Hans, >>> >>>> Have you tried turning off the computer and removing the battery, >>>> then wait 5 minutes and put the battery back again? >>> >>> >>> OK, I had to run to the store to get the right size Torx screwdriver, >>> but I removed the battery and waited 15 minutes before putting it back >>> in. Unfortunately, that didn't help. >>> >>>> Likely having the touchpad properly working results in either the >>>> touchpad >>>> or the i2c controller firing an interrupt at boot because of state left >>>> over from the previous boot with working touchpad. Since you now lack >>>> a working driver, nothing acks the interrupt and it keeps firing. >>> >>> >>> Interesting! Thanks for teaching me! >>> >>>> The proper solution here would be to build 4.15 with the fix, or >>>> see if there are some patches to the nvidia driver to make it build >>>> with 4.17, which avoids the need for another kernel build. >>> >>> >>> Alas, I've had too much difficulty with the latter solutions. 4.17 >>> vanilla (no patch) seems to ack it already. 4.15 would need something >>> else to ack it in the first place, and I'm not sure where to patch >>> that. I couldn't find any Nvidia patches for 4.17. >>> >>> Fortunately, Best Buy let me return the laptop without any fuss. They >>> were even offering to exchange it. I declined, but I might try again >>> and live without the touchpad for a time since the laptop just went on >>> sale this morning. >>> >>> As such, when, roughly, do you think the official patch will land? >>> Kernel 4.17 or 4.18? (Supposedly Ubuntu 18.10 will have 4.18, if the >>> stars align.) >> >> >> I would expect 4.18 given that 4.17 is more or less a done deal, but >> once the patch is out I expect it to also be cherry-picked as a bug-fix >> to the stable releases of older kernels. >> >> Jarkko, I did not see an official patch for this yet, I'm not on the >> list though, so I don't know if was not posted at all, or if you did >> not Cc me? (not Cc-ing me is fine I'm only sideways involved). >> >> Regards, >> >> Hans >> >> >> >> >> >>> >>> Thanks! >>> >>> On Sun, May 20, 2018 at 3:13 AM, Hans de Goede <hdegoede@xxxxxxxxxx> >>> wrote: >>>> >>>> Hi, >>>> >>>> >>>> On 20-05-18 04:34, wereturtle wrote: >>>>> >>>>> >>>>> Some bad news to follow up the good news. >>>>> >>>>> Installing the patched Kernel for my touchpad had a negative side >>>>> effect. While running the patched Kernel, I didn't have any issues. >>>>> However, I couldn't get the Nvidia driver to install with this Kernel. >>>>> As such, I tried rebooting into my old 4.15 Kernel. Even after >>>>> removing the patched Kernel and reinstalling the Nvidia driver several >>>>> times for 4.15, my computer became sluggish during browsing, typing, >>>>> etc. Games were locking up or else having a huge framerate drop. My >>>>> CPU cores were spinning like crazy without even any processes taking >>>>> up CPU. >>>>> >>>>> Further investigation revealed a an "unexpected IRQ trap at vector 9a" >>>>> error message at startup and shutdown in the console. The message >>>>> fires constantly. Under /proc/interrupts, it was listing intel-gpio >>>>> (i2c) for 9a. It was firing off like crazy. I think that's for my >>>>> touchpad? >>>>> >>>>> I tried reinstalling Kubuntu altogether, twice, and it wouldn't stop. >>>>> It's like that patch permanently wrote something to my hardware? I >>>>> tried installing 4.17 RC5 with Ukuu without the touchpad patch built >>>>> in. The intel-gpio interrupts went away, and the computer is snappy >>>>> and responsive again. However, rebooting back into 4.15 resulted in >>>>> the interrupts returning. >>>>> >>>>> How did this patch end up doing something permanently to my computer, >>>>> and what can I do to undo it for Kernel 4.15 so that I can use my >>>>> Nvidia drivers again? >>>> >>>> >>>> >>>> Have you tried turning off the computer and removing the battery, >>>> then wait 5 minutes and put the battery back again? >>>> >>>> Likely having the touchpad properly working results in either the >>>> touchpad >>>> or the i2c controller firing an interrupt at boot because of state left >>>> over from the previous boot with working touchpad. Since you now lack >>>> a working driver, nothing acks the interrupt and it keeps firing. >>>> >>>> The proper solution here would be to build 4.15 with the fix, or >>>> see if there are some patches to the nvidia driver to make it build >>>> with 4.17, which avoids the need for another kernel build. >>>> >>>> Regards, >>>> >>>> Hans >>>> >>>> >>>> >>>> >>>>> >>>>> Also, I don't notice this same sluggishness with Windows 10. Windows >>>>> is snappy regardless of the Kernel. >>>>> >>>>> On Sat, May 19, 2018 at 12:42 PM, wereturtle <wereturtledev@xxxxxxxxx> >>>>> wrote: >>>>>> >>>>>> >>>>>> Hi everyone! >>>>>> >>>>>> I set the clk_rate to be 216000000 in my own patched Kernel 4.17. RC >>>>>> 5, and my touchpad now works! >>>>>> >>>>>> Thank you so much! >>>>>> >>>>>> On Fri, May 18, 2018 at 12:39 AM, Hans de Goede <hdegoede@xxxxxxxxxx> >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> >>>>>>> On 18-05-18 09:32, Hans de Goede wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> On 17-05-18 20:14, Dmitry Torokhov wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, May 17, 2018 at 2:36 AM, Benjamin Tissoires >>>>>>>>> <benjamin.tissoires@xxxxxxxxx> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Scope (_SB.PCI0.I2C1) >>>>>>>>>> { >>>>>>>>>> Device (ETPD) >>>>>>>>>> { >>>>>>>>>> Name (SBFB, ResourceTemplate () >>>>>>>>>> { >>>>>>>>>> I2cSerialBusV2 (0x004C, ControllerInitiated, >>>>>>>>>> 0x00061A80, >>>>>>>>>> AddressingMode7Bit, "\\_SB.PCI0.I2C1", >>>>>>>>>> 0x00, ResourceConsumer, _Y34, Exclusive, >>>>>>>>>> ) >>>>>>>>>> }) >>>>>>>>>> Name (SBFI, ResourceTemplate () >>>>>>>>>> { >>>>>>>>>> Interrupt (ResourceConsumer, Level, ActiveHigh, >>>>>>>>>> Exclusive, ,, ) >>>>>>>>>> { >>>>>>>>>> 0x0000005F, >>>>>>>>>> } >>>>>>>>>> }) >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> ... >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> So nothing scary, the interrupt is a plain interrupt, not a GPIO. I >>>>>>>>>> guess the issue lies in i2c-designware and the AMD >>>>>>>>>> implementation... >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Also, in dmesg we have: >>>>>>>>> >>>>>>>>> [ 25.020612] cannonlake-pinctrl INT3450:00: pin 26 cannot be used >>>>>>>>> as >>>>>>>>> IRQ >>>>>>>>> [ 25.020615] genirq: Setting trigger mode 3 for irq 137 failed >>>>>>>>> (intel_gpio_irq_type+0x0/0x140) >>>>>>>>> [ 25.023113] intel-lpss 0000:00:15.1: enabling device (0000 -> >>>>>>>>> 0002) >>>>>>>>> [ 25.023336] idma64 idma64.1: Found Intel integrated DMA 64-bit >>>>>>>>> [ 25.025326] i2c_hid i2c-ELAN1201:00: i2c-ELAN1201:00 supply vdd >>>>>>>>> not >>>>>>>>> found, using dummy regulator >>>>>>>>> [ 25.025494] i2c_designware i2c_designware.1: >>>>>>>>> i2c_dw_handle_tx_abort: lost arbitration >>>>>>>>> [ 25.025652] i2c_designware i2c_designware.1: >>>>>>>>> i2c_dw_handle_tx_abort: lost arbitration >>>>>>>>> [ 25.025811] i2c_designware i2c_designware.1: >>>>>>>>> i2c_dw_handle_tx_abort: lost arbitration >>>>>>>>> [ 25.025970] i2c_designware i2c_designware.1: >>>>>>>>> i2c_dw_handle_tx_abort: lost arbitration >>>>>>>>> [ 25.025972] i2c_hid i2c-ELAN1201:00: hid_descr_cmd failed >>>>>>>>> >>>>>>>>> 0x5F is kind of high for a plain interrupt; I wonder if ACPI table >>>>>>>>> relies on static gpio->virq mapping that could be different on >>>>>>>>> Linux... Also I am surprised the IRQ is active-HIGH, normally it is >>>>>>>>> active low. Might want to try and hack the driver to force it to low >>>>>>>>> and see what happens... >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Yes the interrupt is definitely suspect. Actually using plain >>>>>>>> interrupts >>>>>>>> rather then a GpioInt is something which I would only expect to see >>>>>>>> in >>>>>>>> old DSDTs and not in recent ones, because for i2c devices there is >>>>>>>> no clear parent interrupt controller and as such no well defined way >>>>>>>> to >>>>>>>> properly interpret a raw Interrupt number. >>>>>>>> >>>>>>>> What is with the AMD reference btw, the above dmesg snippet looks >>>>>>>> to be about an Intel system? I would not expect cannonlake-pinctrl >>>>>>>> to be used on an AMD system... >>>>>>>> >>>>>>>> If this indeed is an Intel system one thing worth trying is >>>>>>>> changing the clk_freq we use in the i2c-designware driver to >>>>>>>> calculate clk high and low counts. There are some reports from >>>>>>>> people attaching scopes to the i2c wires that on kabylake at >>>>>>>> least the driver is driving the bus about 1.5 times the >>>>>>>> expected rate (so 600KHz instead of 400KHz). >>>>>>>> >>>>>>>> A workaround for now would be to edit: >>>>>>>> >>>>>>>> drivers/mfd/intel-lpss-pci.c >>>>>>>> >>>>>>>> And change clk_rate in: >>>>>>>> >>>>>>>> static const struct intel_lpss_platform_info spt_i2c_info = { >>>>>>>> .clk_rate = 120000000, >>>>>>>> .properties = spt_i2c_properties, >>>>>>>> }; >>>>>>>> >>>>>>>> From 120000000 to 180000000, people are still working on getting >>>>>>>> to the bottom of this but it is worth a shot. The clk_rate >>>>>>>> value here is only used to calculate i2c timings and does >>>>>>>> not actually program a clock, it only specifies the frequency >>>>>>>> the clock is expected to be running at. So changing this should >>>>>>>> be safe. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Ok, so I just read the new mails in the threads where this is being >>>>>>> discussed and it has been confirmed by Intel that for all Canon Lake >>>>>>> devices the correct clk_rate is 216000000 . Which likely explains >>>>>>> the i2c errors here. Jarkko (added to the Cc) is working on a patch >>>>>>> for this. >>>>>>> >>>>>>> For now if you can build your own kernels you can make the change I >>>>>>> suggested above, but that will also change the clock-rate on other >>>>>>> machines, so that is just for testing on Canon Lake hardware! >>>>>>> >>>>>>> The way the Interrupt is specified is still suspicious btw, but >>>>>>> we'll cross that bridge when we get there. >>>>>>> >>>>>>> Regards, >>>>>>> >>>>>>> Hans