From: Haris Okanovic <haris.okanovic@xxxxxx> ioread8() operations to TPM MMIO addresses can stall the CPU when immediately following a sequence of iowrite*()'s to the same region. For example, cyclitest measures ~400us latency spikes when a non-RT usermode application communicates with an SPI-based TPM chip (Intel Atom E3940 system, PREEMPT_RT kernel). The spikes are caused by a stalling ioread8() operation following a sequence of 30+ iowrite8()s to the same address. I believe this happens because the write sequence is buffered (in CPU or somewhere along the bus), and gets flushed on the first LOAD instruction (ioread*()) that follows. The enclosed change appears to fix this issue: read the TPM chip's access register (status code) after every iowrite*() operation to amortize the cost of flushing data to chip across multiple instructions. Signed-off-by: Haris Okanovic <haris.okanovic@xxxxxx> Link: https://lore.kernel.org/r/20230323153436.B2SATnZV@xxxxxxxxxxxxx Signed-off-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> --- v1…v2: - Updated/ added comments as per Jarkko Sakkinen. On 2023-03-30 03:25:34 [+0300], Jarkko Sakkinen wrote: … > I would replace this with: > > /* > * Flush previous write operations with a dummy read operation to the > * TPM MMIO base address. > */ > > I think rest of the reasoning would be better place to the functions, > which are call sites for this helper, and here it would be make more > sense to explain what it actually does. … > > Thanks for catching this up. It is a small code change but I think that it > would deserve just a bit more documentation, as it would make sure that the > reasoning you gave is taken into account in the future code reviews. > > I think that if you append what I suggested above (you can use your > judgement and edit as you will), it should be sufficient. Did as asked. However, it is a bit misleading given that the comment above tpm_tis_iowrite*() describes the flush behaviour which is conditional on CONFIG_PREEMPT_RT. Do you want this flush unconditionally or you fine the way it is? drivers/char/tpm/tpm_tis.c | 43 +++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 41 insertions(+), 2 deletions(-) --- a/drivers/char/tpm/tpm_tis.c +++ b/drivers/char/tpm/tpm_tis.c @@ -50,6 +50,45 @@ static inline struct tpm_tis_tcg_phy *to return container_of(data, struct tpm_tis_tcg_phy, priv); } +#ifdef CONFIG_PREEMPT_RT +/* + * Flush previous write operations with a dummy read operation to the + * TPM MMIO base address. + */ +static inline void tpm_tis_flush(void __iomem *iobase) +{ + ioread8(iobase + TPM_ACCESS(0)); +} +#else +#define tpm_tis_flush(iobase) do { } while (0) +#endif + +/* + * Write a byte word to the TPM MMIO address, and flush the write queue. + * The flush ensures that the data is sent immediately over the bus and not + * aggregated with further requests and transferred later in a batch. The large + * write requests can lead to unwanted latency spikes by blocking the CPU until + * the complete batch has been transferred. + */ +static inline void tpm_tis_iowrite8(u8 b, void __iomem *iobase, u32 addr) +{ + iowrite8(b, iobase + addr); + tpm_tis_flush(iobase); +} + +/* + * Write a 32-bit word to the TPM MMIO address, and flush the write queue. + * The flush ensures that the data is sent immediately over the bus and not + * aggregated with further requests and transferred later in a batch. The large + * write requests can lead to unwanted latency spikes by blocking the CPU until + * the complete batch has been transferred. + */ +static inline void tpm_tis_iowrite32(u32 b, void __iomem *iobase, u32 addr) +{ + iowrite32(b, iobase + addr); + tpm_tis_flush(iobase); +} + static int interrupts = -1; module_param(interrupts, int, 0444); MODULE_PARM_DESC(interrupts, "Enable interrupts"); @@ -186,12 +225,12 @@ static int tpm_tcg_write_bytes(struct tp switch (io_mode) { case TPM_TIS_PHYS_8: while (len--) - iowrite8(*value++, phy->iobase + addr); + tpm_tis_iowrite8(*value++, phy->iobase, addr); break; case TPM_TIS_PHYS_16: return -EINVAL; case TPM_TIS_PHYS_32: - iowrite32(le32_to_cpu(*((__le32 *)value)), phy->iobase + addr); + tpm_tis_iowrite32(le32_to_cpu(*((__le32 *)value)), phy->iobase, addr); break; }