On 11.05.22 at 17:06, Jarkko Sakkinen wrote: > On Mon, May 09, 2022 at 10:05:57AM +0200, Lino Sanfilippo wrote: >> From: Lino Sanfilippo <l.sanfilippo@xxxxxxxxxx> >> >> The interrupt handler that sets irq_tested to indicate that interrupts are >> working may run on another CPU than the thread that checks this variable in >> tmp_tis_send(). >> >> Since no synchronization is used to access irq_tested, there is no >> guarantee for cache coherency between the CPUs, so that the value set by >> the interrupt handler might not be visible to the testing thread. >> >> Avoid this issue by using a bitfield instead of a boolean variable and by >> accessing this field with bit manipulating functions that guarantee cache >> coherency. >> >> Signed-off-by: Lino Sanfilippo <l.sanfilippo@xxxxxxxxxx> >> --- >> drivers/char/tpm/tpm_tis_core.c | 13 +++++++------ >> drivers/char/tpm/tpm_tis_core.h | 6 +++++- >> 2 files changed, 12 insertions(+), 7 deletions(-) >> >> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c >> index 4f3b82c3f205..bdfde1cd71fe 100644 >> --- a/drivers/char/tpm/tpm_tis_core.c >> +++ b/drivers/char/tpm/tpm_tis_core.c >> @@ -470,7 +470,8 @@ static int tpm_tis_send(struct tpm_chip *chip, u8 *buf, size_t len) >> int rc, irq; >> struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); >> >> - if (!(chip->flags & TPM_CHIP_FLAG_IRQ) || priv->irq_tested) >> + if (!(chip->flags & TPM_CHIP_FLAG_IRQ) || >> + test_bit(TPM_TIS_IRQTEST_OK, &priv->irqtest_flags)) >> return tpm_tis_send_main(chip, buf, len); >> >> /* Verify receipt of the expected IRQ */ >> @@ -480,11 +481,11 @@ static int tpm_tis_send(struct tpm_chip *chip, u8 *buf, size_t len) >> rc = tpm_tis_send_main(chip, buf, len); >> priv->irq = irq; >> chip->flags |= TPM_CHIP_FLAG_IRQ; >> - if (!priv->irq_tested) >> + if (!test_bit(TPM_TIS_IRQTEST_OK, &priv->irqtest_flags)) >> tpm_msleep(1); >> - if (!priv->irq_tested) >> + if (!test_bit(TPM_TIS_IRQTEST_OK, &priv->irqtest_flags)) >> disable_interrupts(chip); >> - priv->irq_tested = true; >> + set_bit(TPM_TIS_IRQTEST_OK, &priv->irqtest_flags); >> return rc; >> } >> >> @@ -689,7 +690,7 @@ static irqreturn_t tis_int_handler(int dummy, void *dev_id) >> if (interrupt == 0) >> return IRQ_NONE; >> >> - priv->irq_tested = true; >> + set_bit(TPM_TIS_IRQTEST_OK, &priv->irqtest_flags); >> if (interrupt & TPM_INTF_DATA_AVAIL_INT) >> wake_up_interruptible(&priv->read_queue); >> if (interrupt & TPM_INTF_LOCALITY_CHANGE_INT) >> @@ -780,7 +781,7 @@ static int tpm_tis_probe_irq_single(struct tpm_chip *chip, u32 intmask, >> if (rc < 0) >> return rc; >> >> - priv->irq_tested = false; >> + clear_bit(TPM_TIS_IRQTEST_OK, &priv->irqtest_flags); >> chip->flags |= TPM_CHIP_FLAG_IRQ; >> >> /* Generate an interrupt by having the core call through to >> diff --git a/drivers/char/tpm/tpm_tis_core.h b/drivers/char/tpm/tpm_tis_core.h >> index 43b724e55192..c8972ea8e13e 100644 >> --- a/drivers/char/tpm/tpm_tis_core.h >> +++ b/drivers/char/tpm/tpm_tis_core.h >> @@ -89,11 +89,15 @@ enum tpm_tis_flags { >> TPM_TIS_USE_THREADED_IRQ = BIT(2), >> }; >> >> +enum tpm_tis_irqtest_flags { >> + TPM_TIS_IRQTEST_OK = BIT(0), >> +}; >> + >> struct tpm_tis_data { >> u16 manufacturer_id; >> int locality; >> int irq; >> - bool irq_tested; >> + unsigned long irqtest_flags; >> unsigned long flags; >> void __iomem *ilb_base_addr; >> u16 clkrun_enabled; >> -- >> 2.36.0 >> > > So, this would caused by changing to threaded IRQs? > > BR, Jarkko > No, this is caused by the fact that we access irq_tested from different paths of execution. The writing and reading to/from irq_tested from different execution paths without any synchronization is a plain bug in the existing code: We simply cannot guarantee that the value we set for irq_tested in the interrupt handler which may run on CPU 1 is visible in tpm_tis_send() which may run on CPU2. This is because there is no enforced data transfer between the CPUs and thus the written value may end up in the CPU cache of CPU 1 and never (or at least not within the timout of 1 ms as used in tpm_tis_send()) be seen by CPU 2. This kind of issues are in length described in memory-barriers.txt. The patch is supposed to fix this bug. Regards, Lino