On Wed, Feb 1, 2023 at 10:56 PM Kefeng Wang <wangkefeng.wang@xxxxxxxxxx> wrote: > > > > On 2023/1/20 11:46, Jiaqi Yan wrote: > > Right before memory_failure finishes its handling, accumulate poisoned > > page's resolution counters to pglist_data's memory_failure_stats, so as > > to update the corresponding sysfs entries. > > > > Tested: > > 1) Start an application to allocate memory buffer chunks > > 2) Convert random memory buffer addresses to physical addresses > > 3) Inject memory errors using EINJ at chosen physical addresses > > 4) Access poisoned memory buffer and recover from SIGBUS > > 5) Check counter values under > > /sys/devices/system/node/node*/memory_failure/* > > > > Acked-by: David Rientjes <rientjes@xxxxxxxxxx> > > Signed-off-by: Jiaqi Yan <jiaqiyan@xxxxxxxxxx> > > --- > > mm/memory-failure.c | 36 ++++++++++++++++++++++++++++++++++++ > > 1 file changed, 36 insertions(+) > > > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > > index c628f1db3a4d..f4990839ea66 100644 > > --- a/mm/memory-failure.c > > +++ b/mm/memory-failure.c > > @@ -1227,6 +1227,39 @@ static struct page_state error_states[] = { > > #undef slab > > #undef reserved > > > > +static void update_per_node_mf_stats(unsigned long pfn, > > + enum mf_result result) > > +{ > > + int nid = MAX_NUMNODES; > > + struct memory_failure_stats *mf_stats = NULL; > > + > > + nid = pfn_to_nid(pfn); > > + if (unlikely(nid < 0 || nid >= MAX_NUMNODES)) { > > + WARN_ONCE(1, "Memory failure: pfn=%#lx, invalid nid=%d", pfn, nid); > > + return; > > + } > > + > ... > > + default: > > + WARN_ONCE(1, "Memory failure: mf_result=%d is not properly handled", result); > > + break; > > + } > > We already define pr_fmt, the "Memory failure:" prefix should be dropped. "Should be dropped" because it will print duplicated prefixes? Does WARN_ONCE also automatically include pr_fmt? I don't think that's the case when I read __warn_printk. This is what I saw from dmesg when I add a `WARN_ONCE(1, "Memory failure: pfn=%#lx\n", pfn);` at the beginning of `update_per_node_mf_stats` (above `nid=pfn_to_nid(pfn)`): [ 523.942688] ------------[ cut here ]------------ [ 523.972026] Memory failure: pfn=0x309f8f3 [ 523.972038] WARNING: CPU: 4 PID: 21119 at mm/memory-failure.c:1236 action_result+0xec/0x150 [ 523.972044] Modules linked in: einj vfat fat i2c_mux_pca954x i2c_mux spidev cdc_acm xhci_pci xhci_hcd sha3_generic gq(O) [ 523.972054] CPU: 4 PID: 21119 Comm: usemem Tainted: G S M O 6.2.0-smp-DEV #1 [ 523.972059] RIP: 0010:action_result+0xec/0x150 No duplicated "Memory failure:". But I realize I should probably add "\n" within WARN_ONCE.