On Mon, 16 Jan 2023 19:39:01 +0000 Jiaqi Yan <jiaqiyan@xxxxxxxxxx> wrote: > Right before memory_failure finishes its handling, accumulate poisoned > page's resolution counters to pglist_data's memory_failure_stats, so as > to update the corresponding sysfs entries. > > Tested: > 1) Start an application to allocate memory buffer chunks > 2) Convert random memory buffer addresses to physical addresses > 3) Inject memory errors using EINJ at chosen physical addresses > 4) Access poisoned memory buffer and recover from SIGBUS > 5) Check counter values under > /sys/devices/system/node/node*/memory_failure/pages_* > > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -1227,6 +1227,39 @@ static struct page_state error_states[] = { > #undef slab > #undef reserved > > +static void update_per_node_mf_stats(unsigned long pfn, > + enum mf_result result) > +{ > + int nid = MAX_NUMNODES; > + struct memory_failure_stats *mf_stats = NULL; > + > + nid = pfn_to_nid(pfn); > + if (unlikely(nid < 0 || nid >= MAX_NUMNODES)) { > + WARN_ONCE(1, "Memory failure: pfn=%#lx, invalid nid=%d", pfn, nid); > + return; > + } > + > + mf_stats = &NODE_DATA(nid)->mf_stats; > + switch (result) { > + case MF_IGNORED: > + ++mf_stats->pages_ignored; What is the locking here, to prevent concurrent increments? > + break; > + case MF_FAILED: > + ++mf_stats->pages_failed; > + break; > + case MF_DELAYED: > + ++mf_stats->pages_delayed; > + break; > + case MF_RECOVERED: > + ++mf_stats->pages_recovered; > + break; > + default: > + WARN_ONCE(1, "Memory failure: mf_result=%d is not properly handled", result); > + break; > + } > + ++mf_stats->pages_poisoned; > +}