On Tue, Nov 26, 2024 at 6:32 PM Tomohiro Misono (Fujitsu) <misono.tomohiro@xxxxxxxxxxx> wrote: > > > On 2024/11/21 12:55, Tomohiro Misono wrote: > > > commit 44b8f8bf2438 ("mm: memory-failure: add memory failure stats > > > > Sorry for late, I've been swamped recently. > > Hi, > Thanks for your comments. > > > > > > to sysfs") introduces per NUMA memory error stats which show > > > breakdown of HardwareCorrupted of /proc/meminfo in > > > /sys/devices/system/node/nodeX/memory_failure. > > > > Thanks for your patch. > > > > > > > > However, HardwareCorrupted also counts soft-offline pages. So, add > > > soft-offline stats in mf_stats too to represent more accurate status. > > > > Adding soft-offline stats makes sense to me. > > Thanks for confirming. Agreed with Miaohe. > > > > > > > > > This updates total count as: > > > total = recovered + ignored + failed + delayed + soft_offline> > > > Test example: > > > 1) # grep HardwareCorrupted /proc/meminfo > > > HardwareCorrupted: 0 kB > > > 2) soft-offline 1 page by madvise(MADV_SOFT_OFFLINE) > > > 3) # grep HardwareCorrupted /proc/meminfo > > > HardwareCorrupted: 4 kB > > > # grep -r "" /sys/devices/system/node/node0/memory_failure > > > /sys/devices/system/node/node0/memory_failure/total:1 > > > /sys/devices/system/node/node0/memory_failure/soft_offline:1 > > > /sys/devices/system/node/node0/memory_failure/recovered:0 > > > /sys/devices/system/node/node0/memory_failure/ignored:0 > > > /sys/devices/system/node/node0/memory_failure/failed:0 > > > /sys/devices/system/node/node0/memory_failure/delayed:0 > > > > > > Signed-off-by: Tomohiro Misono <misono.tomohiro@xxxxxxxxxxx> > > > --- > > > Hello > > > > > > This is RFC because I'm not sure adding SOFT_OFFLINE in enum > > > mf_result is a right approach. Also, maybe is it better to move > > > update_per_node_mf_stats() into num_poisoned_pages_inc()? > > > > > > I omitted some cleanups and sysfs doc update in this version to > > > highlight changes. I'd appreciate any suggestions. > > > > > > Regards, > > > Tomohiro Misono > > > > > > include/linux/mm.h | 2 ++ > > > include/linux/mmzone.h | 4 +++- > > > mm/memory-failure.c | 9 +++++++++ > > > 3 files changed, 14 insertions(+), 1 deletion(-) > > > > > > diff --git a/include/linux/mm.h b/include/linux/mm.h > > > index 5d6cd523c7c0..7f93f6883760 100644 > > > --- a/include/linux/mm.h > > > +++ b/include/linux/mm.h > > > @@ -3991,6 +3991,8 @@ enum mf_result { > > > MF_FAILED, /* Error: handling failed */ > > > MF_DELAYED, /* Will be handled later */ > > > MF_RECOVERED, /* Successfully recovered */ > > > + > > > + MF_RES_SOFT_OFFLINE, /* Soft-offline */ > > > > It might not be a good idea to add MF_RES_SOFT_OFFLINE here. 'mf_result' is used to record > > the result of memory failure handler. So it might be inappropriate to add MF_RES_SOFT_OFFLINE here. > > Understood. As I don't see other suitable place to put ENUM value, how about changing like below? > Or, do you prefer adding another ENUM type instead of this? I think SOFT_OFFLINE-ed is one of the results of successfully recovered, and the other one is HARD_OFFLINE-ed. So how about make a separate sub-ENUM for MF_RECOVERED? Something like: enum mf_recovered_result { MF_RECOVERED_SOFT_OFFLINE, MF_RECOVERED_HARD_OFFLINE, }; And 1. total = recovered + ignored + failed + delayed 2. recovered = soft_offline + hard_offline > > ``` > static void update_per_node_mf_stats(unsigned long pfn, > - enum mf_result result) > + enum mf_result result, bool is_soft_offline) > { > int nid = MAX_NUMNODES; > struct memory_failure_stats *mf_stats = NULL; > @@ -1299,6 +1299,12 @@ static void update_per_node_mf_stats(unsigned long pfn, > } > > mf_stats = &NODE_DATA(nid)->mf_stats; > + if (is_soft_offline) { > + ++mf->stats->soft_offlined; > + ++mf_stats->total; > + return; > + } > + > switch (result) { > case MF_IGNORED: > ++mf_stats->ignored; > ``` > > Regards, > Tomohiro Misono > > > > > > > Thanks. > > . >