Hi, On 09.06.23 16:33, Jarkko Sakkinen wrote: > > On Tue May 30, 2023 at 8:47 PM EEST, Lino Sanfilippo wrote: >> From: Lino Sanfilippo <l.sanfilippo@xxxxxxxxxx> >> >> After activation of interrupts for TPM TIS drivers 0-day reports an >> interrupt storm on an Inspur NF5180M6/NF5180M6 server. >> >> Fix this by detecting the storm and falling back to polling: >> Count the number of unhandled interrupts within a 10 ms time interval. In >> case that more than 1000 were unhandled deactivate interrupts entirely, >> deregister the handler and use polling instead. >> >> The storm detection logic equals the implementation in note_interrupt() >> which uses timestamps and counters stored in struct irq_desc. Since this >> structure is private to the generic interrupt core the TPM TIS core uses >> its own timestamps and counters. Furthermore the TPM interrupt handler >> always returns IRQ_HANDLED to prevent the generic interrupt core from >> processing the interrupt storm. >> >> Since the interrupt deregistration function devm_free_irq() waits for all >> interrupt handlers to finish, only trigger a worker in the interrupt >> handler and do the unregistration in the worker to avoid a deadlock. >> >> Reported-by: kernel test robot <yujie.liu@xxxxxxxxx> >> Closes: https://lore.kernel.org/oe-lkp/202305041325.ae8b0c43-yujie.liu@xxxxxxxxx/ >> Suggested-by: Lukas Wunner <lukas@xxxxxxxxx> >> Signed-off-by: Lino Sanfilippo <l.sanfilippo@xxxxxxxxxx> >> --- > > Sorry for the latency. I've moved home office to a new location, > which has caused ~2 week lag. Unfortunate timing. > No prob :) >> drivers/char/tpm/tpm_tis_core.c | 93 ++++++++++++++++++++++++++++----- >> drivers/char/tpm/tpm_tis_core.h | 4 ++ >> 2 files changed, 85 insertions(+), 12 deletions(-) >> >> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c >> index 558144fa707a..7ae8228e803f 100644 >> --- a/drivers/char/tpm/tpm_tis_core.c >> +++ b/drivers/char/tpm/tpm_tis_core.c >> @@ -468,25 +468,32 @@ static int tpm_tis_send_data(struct tpm_chip *chip, const u8 *buf, size_t len) >> return rc; >> } >> >> +static void __tpm_tis_disable_interrupts(struct tpm_chip *chip) >> +{ >> + struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); >> + u32 intmask = 0; >> + >> + tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask); >> + intmask &= ~TPM_GLOBAL_INT_ENABLE; >> + >> + tpm_tis_request_locality(chip, 0); >> + tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask); >> + tpm_tis_relinquish_locality(chip, 0); >> + >> + chip->flags &= ~TPM_CHIP_FLAG_IRQ; >> +} >> + >> static void disable_interrupts(struct tpm_chip *chip) > > Add tpm_ prefix here too. It makes tracing/grepping/etc so much nicer. Ok. > >> { >> struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); >> - u32 intmask; > > int_mask is more readable Ok. > >> - int rc; >> >> if (priv->irq == 0) >> return; >> >> - rc = tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask); >> - if (rc < 0) >> - intmask = 0; >> - >> - intmask &= ~TPM_GLOBAL_INT_ENABLE; >> - rc = tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask); >> + __tpm_tis_disable_interrupts(chip); >> >> devm_free_irq(chip->dev.parent, priv->irq, chip); >> priv->irq = 0; >> - chip->flags &= ~TPM_CHIP_FLAG_IRQ; >> } >> >> /* >> @@ -752,6 +759,53 @@ static bool tpm_tis_req_canceled(struct tpm_chip *chip, u8 status) >> return status == TPM_STS_COMMAND_READY; >> } >> >> +static void tpm_tis_reenable_polling(struct tpm_chip *chip) >> +{ >> + struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); >> + >> + dev_warn(&chip->dev, FW_BUG >> + "TPM interrupt storm detected, polling instead\n"); >> + >> + __tpm_tis_disable_interrupts(chip); >> + >> + /* >> + * devm_free_irq() must not be called from within the interrupt handler, >> + * since this function waits for running handlers to finish and thus it >> + * would deadlock. Instead trigger a worker that takes care of the >> + * unregistration. >> + */ >> + schedule_work(&priv->free_irq_work); >> +} >> + >> +static irqreturn_t tpm_tis_check_for_interrupt_storm(struct tpm_chip *chip) >> +{ >> + struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); >> + const unsigned int MAX_UNHANDLED_IRQS = 1000; > > Please declare this in the beginning of file because it is non-empirical > tuning parameter. I do not want it to be buried here. It is now as good > as a magic number. > > Or perhaps even tpm_tis_core.h? > For now that constant is only used in tpm_tis_core.c. So I would favor to define it there. > Why MAX_UNHANDLED_IRQS is exactly 1000 and not 1? I would rollback eagerly. Because the IRQ line may be shared with another device which has raised the interrupt instead of the TPM. So unhandled interrupts may be legit. Regards, Lino