Hi, On 1/3/22 13:34, Jarkko Nikula wrote: > Hi > > On 1/3/22 12:42, Hans de Goede wrote: >> Hi Jarkko, >> >> On 1/3/22 10:42, Jarkko Nikula wrote: >>> Hi >>> >>> We have a Braswell based preproduction HW. I noticed linux-next tag next-20211224 doesn't boot on it due to following error: >>> >>> [ 34.674271] cherryview-pinctrl INT33FF:00: interrupt on unused interrupt line 0 >>> [ 34.682476] cherryview-pinctrl INT33FF:00: interrupt on unused interrupt line 0 >>> [ 34.690681] cherryview-pinctrl INT33FF:00: interrupt on unused interrupt line 0 >>> ... >>> >>> Linux v5.16-rc8 is ok. I found the following commit to be reason for the regression: >>> >>> bdfbef2d29dc ("pinctrl: cherryview: Don't use selection 0 to mark an interrupt line as unused") >> >> Thank you for the timely headsup about this. >> >> I assume that you have tried a revert (if necessary including reverting some >> of the follow ups) and that reverting the patch you point to fixes the >> issue, right ? >> > Yes since linux-next has only these three commits below to pinctrl-cherryview.c that are not in v5.16-rc8 I reverted them one by one. I often try these kind of simple steps before going to more labor git bisect :-) > > db1b2a8caf5b pinctrl: cherryview: Use temporary variable for struct device > 07199dbf8cae pinctrl: cherryview: Do not allow the same interrupt line to be used by 2 pins > bdfbef2d29dc pinctrl: cherryview: Don't use selection 0 to mark an interrupt line as unused > > I also double checked by checking out to bdfbef2d29dc and tested that commit and it reverted. > >> Can you try the 2 attached patches *one at a time* ? : >> >> 1. Restores the old behavior of just triggering hwirq 0 of >> the pincontroller-domain when we don't know the mapping >> > Patch 1 does work and here's the output from modified error print: > > [ 13.550781] cherryview-pinctrl INT33FF:00: interrupt on unmapped interrupt line 0 > > If you want to go with patch 1 you may add my > Tested-by: Jarkko Nikula <jarkko.nikula@xxxxxxxxxxxxxxx> > >> 2. Restores the old behavior which allows chv_gpio_irq_startup() >> to overwrite the interrupt-line to pin mapping if the current >> mapping points to pin 0 >> > Patch 2 alone doesn't: > > [ 24.977116] cherryview-pinctrl INT33FF:00: interrupt on unused interrupt line 0 > [ 24.985314] cherryview-pinctrl INT33FF:00: interrupt on unused interrupt line 0 > [ 24.993513] cherryview-pinctrl INT33FF:00: interrupt on unused interrupt line 0 > ... Ok, that is good news actually, I was already hoping that patch 1 would fix things. >> Both of these restore old behavior caused by a mapping-table >> entry containing 0 meaning both unset as well as HWIRQ0 >> before the patch in question. >> >> If applying them one at a time does not help, please also try with >> both applied. >> >> These 2 patches should apply cleanly on top of linux-next. >> >> If patch 2 fixes things it would be interesting if you can grab a >> dmesg with "pinctrl-cherryview.dyndbg" added to the command line >> (with the patched kernel). >> > With both applied it does work: > > # dmesg |grep cherryview-pinctrl > [ 15.465425] cherryview-pinctrl INT33FF:00: interrupt on unmapped interrupt line 0 > [ 16.562282] cherryview-pinctrl INT33FF:03: using interrupt line 0 for pin 81 > [ 17.824905] cherryview-pinctrl INT33FF:02: using interrupt line 0 for pin 22 > [ 17.835757] cherryview-pinctrl INT33FF:03: using interrupt line 2 for pin 77 > [ 17.850170] cherryview-pinctrl INT33FF:00: using interrupt line 0 for pin 76 Oh, that is actually interesting, this is a per gpio controller thing, so if we filter on the controller then we end up with: [ 15.465425] cherryview-pinctrl INT33FF:00: interrupt on unmapped interrupt line 0 [ 17.850170] cherryview-pinctrl INT33FF:00: using interrupt line 0 for pin 76 So we do eventually get an IRQ request for a pin using the GPIO controller internal interrupt-line 0. But the IRQ triggers at least once before then and even though we haven't set a handler yet, calling generic_handle_irq for the GPIO-chips irqdomain, offset 0 IRQ does manage to silence the interrupt. I've been tracing this through the kernel code and AFAICT we end up in: arch/x86/kernel/irq.c: ack_bad_irq() in this case: Which means that this message should show up in dmesg: if (printk_ratelimit()) pr_err("unexpected IRQ trap at vector %02x\n", irq); Can you confirm this? Also can you share the full dmesg output of this device (with both patches, with dyndbg option) ? ### Note what we are seeing here is basically a spurious IRQ. Except on a few known broken devices the cherryview pinctrl code relies on the BIOS having configured things so that there are no spurious IRQs. I've attached a quick hack which activates the workaround for known broken devices unconditionally. This replace my previous 2 patches. I expect this to fix things too. If you can make some time to give this one a test too that would be great, then we have some options on how to fix this :) Regards, Hans
From 48c739b102051b71a9d4de2d128f4f2633cd668d Mon Sep 17 00:00:00 2001 From: Hans de Goede <hdegoede@xxxxxxxxxx> Date: Mon, 3 Jan 2022 17:31:36 +0100 Subject: [PATCH] pinctrl: cherryview: Hack to try and workaround linux-next regression Signed-off-by: Hans de Goede <hdegoede@xxxxxxxxxx> --- drivers/pinctrl/intel/pinctrl-cherryview.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/pinctrl/intel/pinctrl-cherryview.c b/drivers/pinctrl/intel/pinctrl-cherryview.c index 683b95e9639a..2ee933c6304a 100644 --- a/drivers/pinctrl/intel/pinctrl-cherryview.c +++ b/drivers/pinctrl/intel/pinctrl-cherryview.c @@ -1558,13 +1558,13 @@ static int chv_gpio_irq_init_hw(struct gpio_chip *chip) * * See also https://bugzilla.kernel.org/show_bug.cgi?id=197953. */ - if (!pctrl->chip.irq.init_valid_mask) { +// if (!pctrl->chip.irq.init_valid_mask) { /* * Mask all interrupts the community is able to generate * but leave the ones that can only generate GPEs unmasked. */ chv_pctrl_writel(pctrl, CHV_INTMASK, GENMASK(31, community->nirqs)); - } +// } /* Clear all interrupts */ chv_pctrl_writel(pctrl, CHV_INTSTAT, 0xffff); -- 2.33.1