Re: [PATCH 00/15] Coalesced Interrupt Delivery with posted MSI

Jacob Pan <jacob.jun.pan@xxxxxxxxxxxxxxx> · Thu, 4 Apr 2024 10:37:35 -0700

Hi Robert,

On Thu, 4 Apr 2024 21:45:05 +0800, Robert Hoo <robert.hoo.linux@xxxxxxxxx>
wrote:

> On 1/27/2024 7:42 AM, Jacob Pan wrote:
> > Hi Thomas and all,
> > 
> > This patch set is aimed to improve IRQ throughput on Intel Xeon by
> > making use of posted interrupts.
> > 
> > There is a session at LPC2023 IOMMU/VFIO/PCI MC where I have presented
> > this topic.
> > 
> > https://lpc.events/event/17/sessions/172/#20231115
> > 
> > Background
> > ==========
> > On modern x86 server SoCs, interrupt remapping (IR) is required and
> > turned on by default to support X2APIC. Two interrupt remapping modes
> > can be supported by IOMMU/VT-d:
> > 
> > - Remappable 	(host)
> > - Posted	(guest only so far)
> > 
> > With remappable mode, the device MSI to CPU process is a HW flow
> > without system software touch points, it roughly goes as follows:
> > 
> > 1.	Devices issue interrupt requests with writes to 0xFEEx_xxxx
> > 2.	The system agent accepts and remaps/translates the IRQ
> > 3.	Upon receiving the translation response, the system agent
> > notifies the destination CPU with the translated MSI
> > 4.	CPU's local APIC accepts interrupts into its IRR/ISR registers
> > 5.	Interrupt delivered through IDT (MSI vector)
> > 
> > The above process can be inefficient under high IRQ rates. The
> > notifications in step #3 are often unnecessary when the destination CPU
> > is already overwhelmed with handling bursts of IRQs. On some
> > architectures, such as Intel Xeon, step #3 is also expensive and
> > requires strong ordering w.r.t DMA.   
> 
> Can you tell more on this "step #3 requires strong ordering w.r.t. DMA"?
> 
I am not sure how much micro architecture details I can disclose but the
point is that there are ordering rules related to DMA read/writes
and posted MSI writes. I am not a hardware expert.

>From PCIe pov, my understanding is that the upstream writes tested here on
NVMe drives as the result of 4K random reads are relaxed ordered. I can see
lspci showing: RlxdOrd+ on my Samsung drives.

DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 512 bytes, MaxReadReq 4096 bytes

But MSIs are strictly ordered afaik.

> > As a result, slower
> > IRQ rates can become a limiting factor for DMA I/O performance.
> >   
> 
> 

Thanks,

Jacob