On Nov 3, 2022, at 9:27 AM, Luck, Tony <tony.luck@xxxxxxxxx> wrote: >> - HPS usually doesn’t consume CPU cores but does consume memory >> controller cycles and memory bandwidth. SW consumes both CPU cycles >> and memory bandwidth, but is only a problem if administrators opt into >> the scanning after weighing the cost benefit. > > Maybe there is a middle ground on platforms that support some s/w programmable > DMA engine that can detect memory errors in a way that doesn't signal a > fatal system error. Your s/w scanner can direct that DMA engine to read from > the regions of memory that you want to scan, at a frequency that is compatible > with your system load requirements and risk assessments. > > If your idea gets traction, maybe structure the code so that it can either use > a CPU core scan a block of memory, or pass requests to a platform driver that can > use a DMA engine to perform the scan. That’s exactly what I was about the write. :) Quickassist can be perfect for that. The IOMMU can be programmed to make the memory uncachable.