Re: [PATCH v4 1/2] mm/memory-failure: introduce "hwpoisoned-pages" entry

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jun 14, 2022 at 12:38:29PM +0800, zhenwei pi wrote:
> Add a new debug entry to show the number of hwpoisoned pages. And
> use module_get/module_put to manager this kernel module, don't allow
> to remove this module unless hwpoisoned-pages is zero.
> 
> Signed-off-by: zhenwei pi <pizhenwei@xxxxxxxxxxxxx>
> ---
>  Documentation/vm/hwpoison.rst |  4 ++++
>  mm/hwpoison-inject.c          | 19 ++++++++++++++++++-
>  2 files changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/vm/hwpoison.rst b/Documentation/vm/hwpoison.rst
> index c742de1769d1..c832a8b192d4 100644
> --- a/Documentation/vm/hwpoison.rst
> +++ b/Documentation/vm/hwpoison.rst
> @@ -155,6 +155,10 @@ Testing
>  	flag bits are defined in include/linux/kernel-page-flags.h and
>  	documented in Documentation/admin-guide/mm/pagemap.rst
>  
> +  hwpoisoned-pages

A bit weird to me. IIUC, this means the number of **software** poisoned
pages instead of **hardware**. The prefix "hw" may be not suitable.  How
about "poisoned-pages" (a little simplified), "poisoned-pfns" (keep the
name consistent with "corrupt-pfn" and "unpoison-pfn") or "swpoisoned-pages"
(sw means software)?

> +	The number of hwpoisoned pages. The hwpoison kernel module can not be
> +	removed unless this count is zero.
> +
>  * Architecture specific MCE injector
>  
>    x86 has mce-inject, mce-test
> diff --git a/mm/hwpoison-inject.c b/mm/hwpoison-inject.c
> index 5c0cddd81505..9e522ecedeef 100644
> --- a/mm/hwpoison-inject.c
> +++ b/mm/hwpoison-inject.c
> @@ -10,6 +10,7 @@
>  #include "internal.h"
>  
>  static struct dentry *hwpoison_dir;
> +static atomic_t hwpoisoned_pages;
>  
>  static int hwpoison_inject(void *data, u64 val)
>  {
> @@ -49,15 +50,28 @@ static int hwpoison_inject(void *data, u64 val)
>  inject:
>  	pr_info("Injecting memory failure at pfn %#lx\n", pfn);
>  	err = memory_failure(pfn, 0);
> +	if (!err) {
> +		WARN_ON(!try_module_get(THIS_MODULE));

__module_get() is enough since we already hold a refcount at open time.
This WARN_ON() will not be triggered unless something unexpected happens.

> +		atomic_inc(&hwpoisoned_pages);
> +	}
> +
>  	return (err == -EOPNOTSUPP) ? 0 : err;
>  }
>  
>  static int hwpoison_unpoison(void *data, u64 val)
>  {
> +	int ret;
> +
>  	if (!capable(CAP_SYS_ADMIN))
>  		return -EPERM;
>  
> -	return unpoison_memory(val);
> +	ret = unpoison_memory(val);
> +	if (!ret) {
> +		atomic_dec(&hwpoisoned_pages);
> +		module_put(THIS_MODULE);
> +	}
> +
> +	return ret;
>  }
>  
>  DEFINE_DEBUGFS_ATTRIBUTE(hwpoison_fops, NULL, hwpoison_inject, "%lli\n");
> @@ -99,6 +113,9 @@ static int pfn_inject_init(void)
>  	debugfs_create_u64("corrupt-filter-flags-value", 0600, hwpoison_dir,
>  			   &hwpoison_filter_flags_value);
>  
> +	debugfs_create_atomic_t("hwpoisoned-pages", 0400, hwpoison_dir,
> +			   &hwpoisoned_pages);
> +
>  #ifdef CONFIG_MEMCG
>  	debugfs_create_u64("corrupt-filter-memcg", 0600, hwpoison_dir,
>  			   &hwpoison_filter_memcg);
> -- 
> 2.20.1
> 
> 




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux