On Thu, Nov 19, 2020 at 03:42:35PM +0100, Hans de Goede wrote: > Hi, > > On 11/19/20 7:36 AM, Jerry Snitselaar wrote: > > > > Matthew Garrett @ 2020-10-15 15:39 MST: > > > >> On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <jsnitsel@xxxxxxxxxx> wrote: > >>> > >>> There is a misconfiguration in the bios of the gpio pin used for the > >>> interrupt in the T490s. When interrupts are enabled in the tpm_tis > >>> driver code this results in an interrupt storm. This was initially > >>> reported when we attempted to enable the interrupt code in the tpm_tis > >>> driver, which previously wasn't setting a flag to enable it. Due to > >>> the reports of the interrupt storm that code was reverted and we went back > >>> to polling instead of using interrupts. Now that we know the T490s problem > >>> is a firmware issue, add code to check if the system is a T490s and > >>> disable interrupts if that is the case. This will allow us to enable > >>> interrupts for everyone else. If the user has a fixed bios they can > >>> force the enabling of interrupts with tpm_tis.interrupts=1 on the > >>> kernel command line. > >> > >> I think an implication of this is that systems haven't been > >> well-tested with interrupts enabled. In general when we've found a > >> firmware issue in one place it ends up happening elsewhere as well, so > >> it wouldn't surprise me if there are other machines that will also be > >> unhappy with interrupts enabled. Would it be possible to automatically > >> detect this case (eg, if we get more than a certain number of > >> interrupts in a certain timeframe immediately after enabling the > >> interrupt) and automatically fall back to polling in that case? It > >> would also mean that users with fixed firmware wouldn't need to pass a > >> parameter. > > > > I believe Matthew is correct here. I found another system today > > with completely different vendor for both the system and the tpm chip. > > In addition another Lenovo model, the L490, has the issue. > > > > This initial attempt at a solution like Matthew suggested works on > > the system I found today, but I imagine it is all sorts of wrong. > > In the 2 systems where I've seen it, there are about 100000 interrupts > > in around 1.5 seconds, and then the irq code shuts down the interrupt > > because they aren't being handled. > > Is that with your patch? The IRQ should be silenced as soon as > devm_free_irq(chip->dev.parent, priv->irq, chip); is called. > > Depending on if we can get your storm-detection to work or not, > we might also choose to just never try to use the IRQ (at least on > x86 systems). AFAIK the TPM is never used for high-throughput stuff > so the polling overhead should not be a big deal (and I'm getting the feeling > that Windows always polls). > > Regards, > > Hans Yeah, this is what I've been wondering for a while. Why could not we just strip off IRQ code? Why does it matter? /Jarkko