Hi Robert, On Thu, 4 Apr 2024 21:45:05 +0800, Robert Hoo <robert.hoo.linux@xxxxxxxxx> wrote: > On 1/27/2024 7:42 AM, Jacob Pan wrote: > > Hi Thomas and all, > > > > This patch set is aimed to improve IRQ throughput on Intel Xeon by > > making use of posted interrupts. > > > > There is a session at LPC2023 IOMMU/VFIO/PCI MC where I have presented > > this topic. > > > > https://lpc.events/event/17/sessions/172/#20231115 > > > > Background > > ========== > > On modern x86 server SoCs, interrupt remapping (IR) is required and > > turned on by default to support X2APIC. Two interrupt remapping modes > > can be supported by IOMMU/VT-d: > > > > - Remappable (host) > > - Posted (guest only so far) > > > > With remappable mode, the device MSI to CPU process is a HW flow > > without system software touch points, it roughly goes as follows: > > > > 1. Devices issue interrupt requests with writes to 0xFEEx_xxxx > > 2. The system agent accepts and remaps/translates the IRQ > > 3. Upon receiving the translation response, the system agent > > notifies the destination CPU with the translated MSI > > 4. CPU's local APIC accepts interrupts into its IRR/ISR registers > > 5. Interrupt delivered through IDT (MSI vector) > > > > The above process can be inefficient under high IRQ rates. The > > notifications in step #3 are often unnecessary when the destination CPU > > is already overwhelmed with handling bursts of IRQs. On some > > architectures, such as Intel Xeon, step #3 is also expensive and > > requires strong ordering w.r.t DMA. > > Can you tell more on this "step #3 requires strong ordering w.r.t. DMA"? > I am not sure how much micro architecture details I can disclose but the point is that there are ordering rules related to DMA read/writes and posted MSI writes. I am not a hardware expert. >From PCIe pov, my understanding is that the upstream writes tested here on NVMe drives as the result of 4K random reads are relaxed ordered. I can see lspci showing: RlxdOrd+ on my Samsung drives. DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq- RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset- MaxPayload 512 bytes, MaxReadReq 4096 bytes But MSIs are strictly ordered afaik. > > As a result, slower > > IRQ rates can become a limiting factor for DMA I/O performance. > > > > Thanks, Jacob