Hi, On 24.05.23 05:58, Jarkko Sakkinen wrote: >> Commit e644b2f498d2 ("tpm, tpm_tis: Enable interrupt test") enabled >> interrupts instead of polling on all capable TPMs. Unfortunately, on some >> products the interrupt line is either never asserted or never deasserted. > > Use Reported-by and Closes tag and remove this paragraph. > > In Closes link instead from lore the email where the bug was reported. Ok > >> The former causes interrupt timeouts and is detected by >> tpm_tis_core_init(). The latter results in interrupt storms. > > Please describe instead the system where the bug was realized. Don't > worry about speculative descriptions. We only deal with ones actually > realized. > >> Recent reports concern the Lenovo ThinkStation P360 Tiny, Lenovo ThinkPad >> L490 and Inspur NF5180M6: >> >> https://lore.kernel.org/linux-integrity/20230511005403.24689-1-jsnitsel@xxxxxxxxxx/ >> https://lore.kernel.org/linux-integrity/d80b180a569a9f068d3a2614f062cfa3a78af5a6.camel@xxxxxxxxxx/ > > Please remove all of this, as the fixes have been handled. Let's keep > the commit message focused. > Ok. >> The current approach to avoid those storms is to disable interrupts by >> adding a DMI quirk for the concerned device. >> >> However this is a maintenance burden in the long run, so use a generic >> approach: >> >> Detect an interrupt storm by counting the number of unhandled interrupts >> within a 10 ms time interval. In case that more than 1000 were unhandled >> deactivate interrupts, deregister the handler and fall back to polling. >> >> This equals the implementation that handles interrupt storms in >> note_interrupt() by means of timestamps and counters in struct irq_desc. >> However the function to access this structure is private so the logic has >> to be reimplemented in the TPM TIS core. > > I only now found out that this is based on kernel/irq/spurious.c code > partly. Why this was unmentioned? > > That would make this already more legitimate because it is based > on field tested metrics. >> Then we only have to discuss about counter. I mentioned this in one of my responses: https://lore.kernel.org/all/da435e0d-5f22-fac7-bc10-96a0fd4c6d54@xxxxxxxxxx/ > >> routine trigger a worker thread that executes the unregistration. >> >> Suggested-by: Lukas Wunner <lukas@xxxxxxxxx> >> Signed-off-by: Lino Sanfilippo <l.sanfilippo@xxxxxxxxxx> >> --- >> drivers/char/tpm/tpm_tis_core.c | 71 +++++++++++++++++++++++++++++++-- >> drivers/char/tpm/tpm_tis_core.h | 6 +++ >> 2 files changed, 74 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c >> index 558144fa707a..458ebf8c2f16 100644 >> --- a/drivers/char/tpm/tpm_tis_core.c >> +++ b/drivers/char/tpm/tpm_tis_core.c >> @@ -752,6 +752,55 @@ static bool tpm_tis_req_canceled(struct tpm_chip *chip, u8 status) >> return status == TPM_STS_COMMAND_READY; >> } >> >> +static void tpm_tis_handle_irq_storm(struct tpm_chip *chip) >> +{ >> + struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); >> + int intmask = 0; >> + >> + dev_err(&chip->dev, HW_ERR >> + "TPM interrupt storm detected, polling instead\n"); > > Degrading this to warn is fine because it is legit behaviour in a > sense. > Agreed. >> + >> + tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask); >> + >> + intmask &= ~TPM_GLOBAL_INT_ENABLE; >> + >> + tpm_tis_request_locality(chip, 0); >> + tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask); >> + tpm_tis_relinquish_locality(chip, 0); >> + >> + chip->flags &= ~TPM_CHIP_FLAG_IRQ; >> + >> + /* >> + * We must not call devm_free_irq() from within the interrupt handler, > > Never use "we" form. Always use either: > > 1. Imperative > 2. Passive > > I.e. to address this, you would write instead "devm_free_irq() must not > be called within the interrupt handler because ...". > Ok. >> + * since this function waits for running interrupt handlers to finish >> + * and thus it would deadlock. Instead trigger a worker that does the >> + * unregistration. >> + */ >> + schedule_work(&priv->free_irq_work); >> +} >> + >> +static void tpm_tis_process_unhandled_interrupt(struct tpm_chip *chip) >> +{ >> + const unsigned int MAX_UNHANDLED_IRQS = 1000; >> + struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); > > Reverse order and add empty line. > Ok. >> + /* >> + * The worker to free the TPM interrupt (free_irq_work) may already >> + * be scheduled, so make sure it is not scheduled again. >> + */ >> + if (!(chip->flags & TPM_CHIP_FLAG_IRQ)) >> + return; >> + >> + if (time_after(jiffies, priv->last_unhandled_irq + HZ/10)) >> + priv->unhandled_irqs = 1; >> + else >> + priv->unhandled_irqs++; >> + >> + priv->last_unhandled_irq = jiffies; >> + >> + if (priv->unhandled_irqs > MAX_UNHANDLED_IRQS) >> + tpm_tis_handle_irq_storm(chip); > > Why wouldn't we switch to polling mode even when there is a single > unhandled IRQ? > As Lukas wrote, not handling an interrupt may be legit if it was raised by a device that shares the interrupt line with the TPM. Regards, Lino