On Wed Apr 19, 2023 at 6:41 PM EEST, Sebastian Andrzej Siewior wrote: > From: Haris Okanovic <haris.okanovic@xxxxxx> > > ioread8() operations to TPM MMIO addresses can stall the CPU when > immediately following a sequence of iowrite*()'s to the same region. > > For example, cyclitest measures ~400us latency spikes when a non-RT > usermode application communicates with an SPI-based TPM chip (Intel Atom > E3940 system, PREEMPT_RT kernel). The spikes are caused by a > stalling ioread8() operation following a sequence of 30+ iowrite8()s to > the same address. I believe this happens because the write sequence is > buffered (in CPU or somewhere along the bus), and gets flushed on the > first LOAD instruction (ioread*()) that follows. > > The enclosed change appears to fix this issue: read the TPM chip's > access register (status code) after every iowrite*() operation to > amortize the cost of flushing data to chip across multiple instructions. > > Signed-off-by: Haris Okanovic <haris.okanovic@xxxxxx> > Link: https://lore.kernel.org/r/20230323153436.B2SATnZV@xxxxxxxxxxxxx > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> > --- > v1…v2: > - Updated/ added comments as per Jarkko Sakkinen. > > On 2023-03-30 03:25:34 [+0300], Jarkko Sakkinen wrote: > … > > I would replace this with: > > > > /* > > * Flush previous write operations with a dummy read operation to the > > * TPM MMIO base address. > > */ > > > > I think rest of the reasoning would be better place to the functions, > > which are call sites for this helper, and here it would be make more > > sense to explain what it actually does. > … > > > > Thanks for catching this up. It is a small code change but I think that it > > would deserve just a bit more documentation, as it would make sure that the > > reasoning you gave is taken into account in the future code reviews. > > > > I think that if you append what I suggested above (you can use your > > judgement and edit as you will), it should be sufficient. > > Did as asked. However, it is a bit misleading given that the comment > above tpm_tis_iowrite*() describes the flush behaviour which is > conditional on CONFIG_PREEMPT_RT. Do you want this flush unconditionally > or you fine the way it is? > > drivers/char/tpm/tpm_tis.c | 43 +++++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 41 insertions(+), 2 deletions(-) > > --- a/drivers/char/tpm/tpm_tis.c > +++ b/drivers/char/tpm/tpm_tis.c > @@ -50,6 +50,45 @@ static inline struct tpm_tis_tcg_phy *to > return container_of(data, struct tpm_tis_tcg_phy, priv); > } > > +#ifdef CONFIG_PREEMPT_RT > +/* > + * Flush previous write operations with a dummy read operation to the > + * TPM MMIO base address. > + */ > +static inline void tpm_tis_flush(void __iomem *iobase) > +{ > + ioread8(iobase + TPM_ACCESS(0)); > +} > +#else > +#define tpm_tis_flush(iobase) do { } while (0) > +#endif > + > +/* > + * Write a byte word to the TPM MMIO address, and flush the write queue. > + * The flush ensures that the data is sent immediately over the bus and not > + * aggregated with further requests and transferred later in a batch. The large > + * write requests can lead to unwanted latency spikes by blocking the CPU until > + * the complete batch has been transferred. > + */ > +static inline void tpm_tis_iowrite8(u8 b, void __iomem *iobase, u32 addr) > +{ > + iowrite8(b, iobase + addr); > + tpm_tis_flush(iobase); > +} > + > +/* > + * Write a 32-bit word to the TPM MMIO address, and flush the write queue. > + * The flush ensures that the data is sent immediately over the bus and not > + * aggregated with further requests and transferred later in a batch. The large > + * write requests can lead to unwanted latency spikes by blocking the CPU until > + * the complete batch has been transferred. > + */ > +static inline void tpm_tis_iowrite32(u32 b, void __iomem *iobase, u32 addr) > +{ > + iowrite32(b, iobase + addr); > + tpm_tis_flush(iobase); > +} > + > static int interrupts = -1; > module_param(interrupts, int, 0444); > MODULE_PARM_DESC(interrupts, "Enable interrupts"); > @@ -186,12 +225,12 @@ static int tpm_tcg_write_bytes(struct tp > switch (io_mode) { > case TPM_TIS_PHYS_8: > while (len--) > - iowrite8(*value++, phy->iobase + addr); > + tpm_tis_iowrite8(*value++, phy->iobase, addr); > break; > case TPM_TIS_PHYS_16: > return -EINVAL; > case TPM_TIS_PHYS_32: > - iowrite32(le32_to_cpu(*((__le32 *)value)), phy->iobase + addr); > + tpm_tis_iowrite32(le32_to_cpu(*((__le32 *)value)), phy->iobase, addr); > break; > } > Reviewed-by: Jarkko Sakkinen <jarkko@xxxxxxxxxx> BR, Jarkko