On 2024/11/21 12:55, Tomohiro Misono wrote: > commit 44b8f8bf2438 ("mm: memory-failure: add memory failure stats Sorry for late, I've been swamped recently. > to sysfs") introduces per NUMA memory error stats which show > breakdown of HardwareCorrupted of /proc/meminfo in > /sys/devices/system/node/nodeX/memory_failure. Thanks for your patch. > > However, HardwareCorrupted also counts soft-offline pages. So, add > soft-offline stats in mf_stats too to represent more accurate status. Adding soft-offline stats makes sense to me. > > This updates total count as: > total = recovered + ignored + failed + delayed + soft_offline> > Test example: > 1) # grep HardwareCorrupted /proc/meminfo > HardwareCorrupted: 0 kB > 2) soft-offline 1 page by madvise(MADV_SOFT_OFFLINE) > 3) # grep HardwareCorrupted /proc/meminfo > HardwareCorrupted: 4 kB > # grep -r "" /sys/devices/system/node/node0/memory_failure > /sys/devices/system/node/node0/memory_failure/total:1 > /sys/devices/system/node/node0/memory_failure/soft_offline:1 > /sys/devices/system/node/node0/memory_failure/recovered:0 > /sys/devices/system/node/node0/memory_failure/ignored:0 > /sys/devices/system/node/node0/memory_failure/failed:0 > /sys/devices/system/node/node0/memory_failure/delayed:0 > > Signed-off-by: Tomohiro Misono <misono.tomohiro@xxxxxxxxxxx> > --- > Hello > > This is RFC because I'm not sure adding SOFT_OFFLINE in enum > mf_result is a right approach. Also, maybe is it better to move > update_per_node_mf_stats() into num_poisoned_pages_inc()? > > I omitted some cleanups and sysfs doc update in this version to > highlight changes. I'd appreciate any suggestions. > > Regards, > Tomohiro Misono > > include/linux/mm.h | 2 ++ > include/linux/mmzone.h | 4 +++- > mm/memory-failure.c | 9 +++++++++ > 3 files changed, 14 insertions(+), 1 deletion(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 5d6cd523c7c0..7f93f6883760 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -3991,6 +3991,8 @@ enum mf_result { > MF_FAILED, /* Error: handling failed */ > MF_DELAYED, /* Will be handled later */ > MF_RECOVERED, /* Successfully recovered */ > + > + MF_RES_SOFT_OFFLINE, /* Soft-offline */ It might not be a good idea to add MF_RES_SOFT_OFFLINE here. 'mf_result' is used to record the result of memory failure handler. So it might be inappropriate to add MF_RES_SOFT_OFFLINE here. Thanks. .