> From: Jacob Pan <jacob.jun.pan@xxxxxxxxxxxxxxx> > Sent: Saturday, April 6, 2024 6:31 AM > > + > +/* > + * De-multiplexing posted interrupts is on the performance path, the code > + * below is written to optimize the cache performance based on the > following > + * considerations: > + * 1.Posted interrupt descriptor (PID) fits in a cache line that is frequently > + * accessed by both CPU and IOMMU. > + * 2.During posted MSI processing, the CPU needs to do 64-bit read and > xchg > + * for checking and clearing posted interrupt request (PIR), a 256 bit field > + * within the PID. > + * 3.On the other side, the IOMMU does atomic swaps of the entire PID > cache > + * line when posting interrupts and setting control bits. > + * 4.The CPU can access the cache line a magnitude faster than the IOMMU. > + * 5.Each time the IOMMU does interrupt posting to the PIR will evict the > PID > + * cache line. The cache line states after each operation are as follows: > + * CPU IOMMU PID Cache line state > + * --------------------------------------------------------------- > + *...read64 exclusive > + *...lock xchg64 modified > + *... post/atomic swap invalid > + *...------------------------------------------------------------- > + * According to VT-d spec: 5.2.3 Interrupt-Posting Hardware Operation: " - Read contents of the Posted Interrupt Descriptor, claiming exclusive ownership of its hosting cache-line. ... - Modify the following descriptor field values atomically: ... - Promote the cache-line to be globally observable, so that the modifications are visible to other caching agents. Hardware may write-back the cache-line anytime after this step. " sounds that the PID cache line is not evicted after IOMMU posting?