On Fri, Aug 04, 2017 at 04:56:51PM -0500, Haris Okanovic wrote: > I have a latency issue using a SPI-based TPM chip with tpm_tis driver > from non-rt usermode application, which induces ~400 us latency spikes > in cyclictest (Intel Atom E3940 system, PREEMPT_RT_FULL kernel). > > The spikes are caused by a stalling ioread8() operation, following a > sequence of 30+ iowrite8()s to the same address. I believe this happens > because the writes are cached (in cpu or somewhere along the bus), which > gets flushed on the first LOAD instruction (ioread*()) that follows. To use the ARM parlance, these accesses aren't "cached" (which would imply that a result could be returned to the load from any intermediate node in the interconnect), but instead are "bufferable". It is really unfortunate that we continue to run into this class of problem across various CPU vendors and various underlying bus technologies; it's the continuing curse of running an PREEMPT_RT on commodity hardware. RT is not easy :) > The enclosed change appears to fix this issue: read the TPM chip's > access register (status code) after every iowrite*() operation. Are we engaged in a game of wack-a-mole with all of the drivers which use this same access pattern (of which I imagine there are quite a few!)? I'm wondering if we should explore the idea of adding a load in the iowriteN()/writeX() macros (marking those accesses in which reads cause side effects explicitly, redirecting to a _raw() variant or something). Obviously that would be expensive for non-RT use cases, but for helping constrain latency, it may be worth it for RT. Julia
Attachment:
signature.asc
Description: PGP signature