On Mon, Jan 16, 2023 at 07:39:00PM +0000, Jiaqi Yan wrote: > Today kernel provides following memory error info to userspace, but each > has its own disadvantage > * HardwareCorrupted in /proc/meminfo: number of bytes poisoned in total, > not per NUMA node stats though > * ras:memory_failure_event: only available after explicitly enabled > * /dev/mcelog provides many useful info about the MCEs, but > doesn't capture how memory_failure recovered memory MCEs > * kernel logs: userspace needs to process log text > > Exposes per NUMA node memory error stats as sysfs entries: > > /sys/devices/system/node/node${X}/memory_failure/pages_poisoned > /sys/devices/system/node/node${X}/memory_failure/pages_recovered > /sys/devices/system/node/node${X}/memory_failure/pages_ignored > /sys/devices/system/node/node${X}/memory_failure/pages_failed > /sys/devices/system/node/node${X}/memory_failure/pages_delayed > > These counters describe how many raw pages are poisoned and after the > attempted recoveries by the kernel, their resolutions: how many are > recovered, ignored, failed, or delayed respectively. > > The following math holds for the statistics: > * pages_poisoned = pages_recovered + pages_ignored + pages_failed + > pages_delayed > * pages_poisoned * PAGE_SIZE = /proc/meminfo/HardwareCorrupted > > Acked-by: David Rientjes <rientjes@xxxxxxxxxx> > Signed-off-by: Jiaqi Yan <jiaqiyan@xxxxxxxxxx> ... > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index cd28a100d9e4..0a14b35a96da 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -1110,6 +1110,31 @@ struct deferred_split { > }; > #endif > > +#ifdef CONFIG_MEMORY_FAILURE > +/* > + * Per NUMA node memory failure handling statistics. > + */ > +struct memory_failure_stats { > + /* > + * Number of pages poisoned. > + * Cases not accounted: memory outside kernel control, offline page, > + * arch-specific memory_failure (SGX), and hwpoison_filter() > + * filtered error events. > + */ Yes, this comment is important. So the sum of the pages_poisoned counters over NUMA nodes can be mismatched to the global counter shown in /proc/meminfo. But this makes code simple, and maybe the new stats info is useful enough even without supporting the special cases. So I'm OK with this. BTW, maybe "unpoison" can be also mentioned here? Thanks, Naoya Horiguchi