And
1. total = recovered + ignored + failed + delayed
2. recovered = soft_offline + hard_offline
Do you mean mf_stats now have 7 entries in sysfs?
(total, ignored, failed, delayed, recovered, hard_offline, soft_offline, then recovered = hard_offline + soft_offline)
Or 6 entries ? (in that case, hard_offline = recovered - soft_offline)
It might be simpler to understand for user if total is just the sum of other entries like this RFC,
but I'd like to know other opinions.
Will it be better to have below items?
"
total
ignored
failed
dalayed
hard_offline
soft_offline
"
The existing "ignored, failed, delayed, recovered" apply to UEs while
"soft_offline" applies to CE. The difference between UE and CE is that
even a recovered UE page has PG_hwpoison set, but a soft offlined page
does not and thus could be re-deployed.
So if we want to flag CE pages, they seem to belong to a different
category, something like -
/sys/devices/system/node/node0/memory_failure/Uncorrected/{ignored, delayed, failed, recovered}
/sys/devices/system/node/node0/memory_failure/Corrected/{offlined}
Thanks,
-jane
though this will break the previous interface.
Any thoughts?
Thanks.
.