> >>> And > >>> 1. total = recovered + ignored + failed + delayed > >>> 2. recovered = soft_offline + hard_offline > >> Do you mean mf_stats now have 7 entries in sysfs? > >> (total, ignored, failed, delayed, recovered, hard_offline, soft_offline, then recovered = hard_offline + > soft_offline) > >> Or 6 entries ? (in that case, hard_offline = recovered - soft_offline) > >> It might be simpler to understand for user if total is just the sum of other entries like this RFC, > >> but I'd like to know other opinions. > > Will it be better to have below items? > > " > > total > > ignored > > failed > > dalayed > > hard_offline > > soft_offline > > " > > The existing "ignored, failed, delayed, recovered" apply to UEs while > "soft_offline" applies to CE. The difference between UE and CE is that > even a recovered UE page has PG_hwpoison set, but a soft offlined page > does not and thus could be re-deployed. Hi, thanks for your comments. If I understand correctly, PG_hwpoison is also set in soft offlined page (and thus counted in HardwareCorrupted too): https://github.com/torvalds/linux/blob/v6.13-rc2/mm/memory-failure.c#L206 Also, unpoison works but can only be used via debugfs by hwpoison-inject module. Is this correct? > > So if we want to flag CE pages, they seem to belong to a different > category, something like - > > /sys/devices/system/node/node0/memory_failure/Uncorrected/{ignored, delayed, failed, recovered} > /sys/devices/system/node/node0/memory_failure/Corrected/{offlined} This makes sense. But as I stated in other thread, I don't think we can change the current I/F for "Uncorrected". Is it worth to create "Corrected" dir only? Regards Tomohiro Misono