On Fri Jun 9, 2023 at 7:03 PM EEST, Lino Sanfilippo wrote: > > Hi, > > On 09.06.23 16:33, Jarkko Sakkinen wrote: > > > > > On Tue May 30, 2023 at 8:47 PM EEST, Lino Sanfilippo wrote: > >> From: Lino Sanfilippo <l.sanfilippo@xxxxxxxxxx> > >> > >> After activation of interrupts for TPM TIS drivers 0-day reports an > >> interrupt storm on an Inspur NF5180M6/NF5180M6 server. > >> > >> Fix this by detecting the storm and falling back to polling: > >> Count the number of unhandled interrupts within a 10 ms time interval. In > >> case that more than 1000 were unhandled deactivate interrupts entirely, > >> deregister the handler and use polling instead. > >> > >> The storm detection logic equals the implementation in note_interrupt() > >> which uses timestamps and counters stored in struct irq_desc. Since this > >> structure is private to the generic interrupt core the TPM TIS core uses > >> its own timestamps and counters. Furthermore the TPM interrupt handler > >> always returns IRQ_HANDLED to prevent the generic interrupt core from > >> processing the interrupt storm. > >> > >> Since the interrupt deregistration function devm_free_irq() waits for all > >> interrupt handlers to finish, only trigger a worker in the interrupt > >> handler and do the unregistration in the worker to avoid a deadlock. > >> > >> Reported-by: kernel test robot <yujie.liu@xxxxxxxxx> > >> Closes: https://lore.kernel.org/oe-lkp/202305041325.ae8b0c43-yujie.liu@xxxxxxxxx/ > >> Suggested-by: Lukas Wunner <lukas@xxxxxxxxx> > >> Signed-off-by: Lino Sanfilippo <l.sanfilippo@xxxxxxxxxx> > >> --- > > > > Sorry for the latency. I've moved home office to a new location, > > which has caused ~2 week lag. Unfortunate timing. > > > > > No prob :) > > > >> drivers/char/tpm/tpm_tis_core.c | 93 ++++++++++++++++++++++++++++----- > >> drivers/char/tpm/tpm_tis_core.h | 4 ++ > >> 2 files changed, 85 insertions(+), 12 deletions(-) > >> > >> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c > >> index 558144fa707a..7ae8228e803f 100644 > >> --- a/drivers/char/tpm/tpm_tis_core.c > >> +++ b/drivers/char/tpm/tpm_tis_core.c > >> @@ -468,25 +468,32 @@ static int tpm_tis_send_data(struct tpm_chip *chip, const u8 *buf, size_t len) > >> return rc; > >> } > >> > >> +static void __tpm_tis_disable_interrupts(struct tpm_chip *chip) > >> +{ > >> + struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); > >> + u32 intmask = 0; > >> + > >> + tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask); > >> + intmask &= ~TPM_GLOBAL_INT_ENABLE; > >> + > >> + tpm_tis_request_locality(chip, 0); > >> + tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask); > >> + tpm_tis_relinquish_locality(chip, 0); > >> + > >> + chip->flags &= ~TPM_CHIP_FLAG_IRQ; > >> +} > >> + > >> static void disable_interrupts(struct tpm_chip *chip) > > > > Add tpm_ prefix here too. It makes tracing/grepping/etc so much nicer. > > Ok. > > > > >> { > >> struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); > >> - u32 intmask; > > > > int_mask is more readable > > Ok. > > > > >> - int rc; > >> > >> if (priv->irq == 0) > >> return; > >> > >> - rc = tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask); > >> - if (rc < 0) > >> - intmask = 0; > >> - > >> - intmask &= ~TPM_GLOBAL_INT_ENABLE; > >> - rc = tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask); > >> + __tpm_tis_disable_interrupts(chip); > >> > >> devm_free_irq(chip->dev.parent, priv->irq, chip); > >> priv->irq = 0; > >> - chip->flags &= ~TPM_CHIP_FLAG_IRQ; > >> } > >> > >> /* > >> @@ -752,6 +759,53 @@ static bool tpm_tis_req_canceled(struct tpm_chip *chip, u8 status) > >> return status == TPM_STS_COMMAND_READY; > >> } > >> > >> +static void tpm_tis_reenable_polling(struct tpm_chip *chip) > >> +{ > >> + struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); > >> + > >> + dev_warn(&chip->dev, FW_BUG > >> + "TPM interrupt storm detected, polling instead\n"); > >> + > >> + __tpm_tis_disable_interrupts(chip); > >> + > >> + /* > >> + * devm_free_irq() must not be called from within the interrupt handler, > >> + * since this function waits for running handlers to finish and thus it > >> + * would deadlock. Instead trigger a worker that takes care of the > >> + * unregistration. > >> + */ > >> + schedule_work(&priv->free_irq_work); > >> +} > >> + > >> +static irqreturn_t tpm_tis_check_for_interrupt_storm(struct tpm_chip *chip) > >> +{ > >> + struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); > >> + const unsigned int MAX_UNHANDLED_IRQS = 1000; > > > > Please declare this in the beginning of file because it is non-empirical > > tuning parameter. I do not want it to be buried here. It is now as good > > as a magic number. > > > > Or perhaps even tpm_tis_core.h? > > > > For now that constant is only used in tpm_tis_core.c. So I would favor to define it there. > > > Why MAX_UNHANDLED_IRQS is exactly 1000 and not 1? I would rollback eagerly. > > > Because the IRQ line may be shared with another device which has raised the > interrupt instead of the TPM. So unhandled interrupts may be legit. I understand that being exact here is impossible. So let's stick to this but please move the constant to the tpm_tis_core.c with the TPM_ prefix because it is an essential tuning parameter. BR, Jarkko