On Tue, Jun 14, 2022 at 12:38:29PM +0800, zhenwei pi wrote: > Add a new debug entry to show the number of hwpoisoned pages. And > use module_get/module_put to manager this kernel module, don't allow > to remove this module unless hwpoisoned-pages is zero. > > Signed-off-by: zhenwei pi <pizhenwei@xxxxxxxxxxxxx> > --- > Documentation/vm/hwpoison.rst | 4 ++++ > mm/hwpoison-inject.c | 19 ++++++++++++++++++- > 2 files changed, 22 insertions(+), 1 deletion(-) > > diff --git a/Documentation/vm/hwpoison.rst b/Documentation/vm/hwpoison.rst > index c742de1769d1..c832a8b192d4 100644 > --- a/Documentation/vm/hwpoison.rst > +++ b/Documentation/vm/hwpoison.rst > @@ -155,6 +155,10 @@ Testing > flag bits are defined in include/linux/kernel-page-flags.h and > documented in Documentation/admin-guide/mm/pagemap.rst > > + hwpoisoned-pages > + The number of hwpoisoned pages. The hwpoison kernel module can not be > + removed unless this count is zero. > + > * Architecture specific MCE injector > > x86 has mce-inject, mce-test > diff --git a/mm/hwpoison-inject.c b/mm/hwpoison-inject.c > index 5c0cddd81505..9e522ecedeef 100644 > --- a/mm/hwpoison-inject.c > +++ b/mm/hwpoison-inject.c > @@ -10,6 +10,7 @@ > #include "internal.h" > > static struct dentry *hwpoison_dir; > +static atomic_t hwpoisoned_pages; > > static int hwpoison_inject(void *data, u64 val) > { > @@ -49,15 +50,28 @@ static int hwpoison_inject(void *data, u64 val) > inject: > pr_info("Injecting memory failure at pfn %#lx\n", pfn); > err = memory_failure(pfn, 0); > + if (!err) { > + WARN_ON(!try_module_get(THIS_MODULE)); > + atomic_inc(&hwpoisoned_pages); > + } There's a few other interfaces to generate "software-simulated memory error" event, i.e. madvise_inject_error() and hard_offline_page_store(). So you need handle such code path. > + > return (err == -EOPNOTSUPP) ? 0 : err; > } > > static int hwpoison_unpoison(void *data, u64 val) > { > + int ret; > + > if (!capable(CAP_SYS_ADMIN)) > return -EPERM; > > - return unpoison_memory(val); > + ret = unpoison_memory(val); > + if (!ret) { > + atomic_dec(&hwpoisoned_pages); > + module_put(THIS_MODULE); > + } > + > + return ret; > } > > DEFINE_DEBUGFS_ATTRIBUTE(hwpoison_fops, NULL, hwpoison_inject, "%lli\n"); > @@ -99,6 +113,9 @@ static int pfn_inject_init(void) > debugfs_create_u64("corrupt-filter-flags-value", 0600, hwpoison_dir, > &hwpoison_filter_flags_value); > > + debugfs_create_atomic_t("hwpoisoned-pages", 0400, hwpoison_dir, > + &hwpoisoned_pages); I'm not sure how useful this interface from userspace (controlling test process with this?). Do we really need to expose this to userspace? TBH I feel that another approach like below is more desirable: - define a new flag in "enum mf_flags" (for example named MF_SW_SIMULATED), - set the flag when calling memory_failure() from the three callers mentioned above, - define a global variable (typed bool) in mm/memory_failure.c_to show that the system has experienced a real hardware memory error events. - once memory_failure() is called without MF_SW_SIMULATED, the new global bool variable is set, and afterward unpoison_memory always fails with -EOPNOTSUPP. Thanks, Naoya Horiguchi