> > > On 2024/11/21 12:55, Tomohiro Misono wrote: > > > > commit 44b8f8bf2438 ("mm: memory-failure: add memory failure stats > > > > > > Sorry for late, I've been swamped recently. > > > > Hi, > > Thanks for your comments. > > > > > > > > > to sysfs") introduces per NUMA memory error stats which show > > > > breakdown of HardwareCorrupted of /proc/meminfo in > > > > /sys/devices/system/node/nodeX/memory_failure. > > > > > > Thanks for your patch. > > > > > > > > > > > However, HardwareCorrupted also counts soft-offline pages. So, add > > > > soft-offline stats in mf_stats too to represent more accurate status. > > > > > > Adding soft-offline stats makes sense to me. > > > > Thanks for confirming. > > Agreed with Miaohe. > > > > > > > > > > > > > > This updates total count as: > > > > total = recovered + ignored + failed + delayed + soft_offline> > > > > Test example: > > > > 1) # grep HardwareCorrupted /proc/meminfo > > > > HardwareCorrupted: 0 kB > > > > 2) soft-offline 1 page by madvise(MADV_SOFT_OFFLINE) > > > > 3) # grep HardwareCorrupted /proc/meminfo > > > > HardwareCorrupted: 4 kB > > > > # grep -r "" /sys/devices/system/node/node0/memory_failure > > > > /sys/devices/system/node/node0/memory_failure/total:1 > > > > /sys/devices/system/node/node0/memory_failure/soft_offline:1 > > > > /sys/devices/system/node/node0/memory_failure/recovered:0 > > > > /sys/devices/system/node/node0/memory_failure/ignored:0 > > > > /sys/devices/system/node/node0/memory_failure/failed:0 > > > > /sys/devices/system/node/node0/memory_failure/delayed:0 > > > > > > > > Signed-off-by: Tomohiro Misono <misono.tomohiro@xxxxxxxxxxx> > > > > --- > > > > Hello > > > > > > > > This is RFC because I'm not sure adding SOFT_OFFLINE in enum > > > > mf_result is a right approach. Also, maybe is it better to move > > > > update_per_node_mf_stats() into num_poisoned_pages_inc()? > > > > > > > > I omitted some cleanups and sysfs doc update in this version to > > > > highlight changes. I'd appreciate any suggestions. > > > > > > > > Regards, > > > > Tomohiro Misono > > > > > > > > include/linux/mm.h | 2 ++ > > > > include/linux/mmzone.h | 4 +++- > > > > mm/memory-failure.c | 9 +++++++++ > > > > 3 files changed, 14 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/include/linux/mm.h b/include/linux/mm.h > > > > index 5d6cd523c7c0..7f93f6883760 100644 > > > > --- a/include/linux/mm.h > > > > +++ b/include/linux/mm.h > > > > @@ -3991,6 +3991,8 @@ enum mf_result { > > > > MF_FAILED, /* Error: handling failed */ > > > > MF_DELAYED, /* Will be handled later */ > > > > MF_RECOVERED, /* Successfully recovered */ > > > > + > > > > + MF_RES_SOFT_OFFLINE, /* Soft-offline */ > > > > > > It might not be a good idea to add MF_RES_SOFT_OFFLINE here. 'mf_result' is used to record > > > the result of memory failure handler. So it might be inappropriate to add MF_RES_SOFT_OFFLINE here. > > > > Understood. As I don't see other suitable place to put ENUM value, how about changing like below? > > Or, do you prefer adding another ENUM type instead of this? > > I think SOFT_OFFLINE-ed is one of the results of successfully > recovered, and the other one is HARD_OFFLINE-ed. So how about make a > separate sub-ENUM for MF_RECOVERED? Something like: Thanks for the suggestion. > > enum mf_recovered_result { > MF_RECOVERED_SOFT_OFFLINE, > MF_RECOVERED_HARD_OFFLINE, > }; Ok. > > And > 1. total = recovered + ignored + failed + delayed > 2. recovered = soft_offline + hard_offline Do you mean mf_stats now have 7 entries in sysfs? (total, ignored, failed, delayed, recovered, hard_offline, soft_offline, then recovered = hard_offline + soft_offline) Or 6 entries ? (in that case, hard_offline = recovered - soft_offline) It might be simpler to understand for user if total is just the sum of other entries like this RFC, but I'd like to know other opinions. Regards, Tomohiro Misono > > > > > ``` > > static void update_per_node_mf_stats(unsigned long pfn, > > - enum mf_result result) > > + enum mf_result result, bool is_soft_offline) > > { > > int nid = MAX_NUMNODES; > > struct memory_failure_stats *mf_stats = NULL; > > @@ -1299,6 +1299,12 @@ static void update_per_node_mf_stats(unsigned long pfn, > > } > > > > mf_stats = &NODE_DATA(nid)->mf_stats; > > + if (is_soft_offline) { > > + ++mf->stats->soft_offlined; > > + ++mf_stats->total; > > + return; > > + } > > + > > switch (result) { > > case MF_IGNORED: > > ++mf_stats->ignored; > > ``` > > > > Regards, > > Tomohiro Misono > > > > > > > > > > > Thanks. > > > . > >