Re: [PATCH v3 3/5] mm/memory-failure: improve memory failure action_result messages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/22/2024 7:31 PM, Miaohe Lin wrote:

[..]
+/*
+ * MF_IGNORED - The m-f() handler marks the page as PG_hwpoisoned'ed.
+ * But it could not do more to isolate the page from being accessed again,
+ * nor does it kill the process. This is extremely rare and one of the
+ * potential causes is that the page state has been changed due to
+ * underlying race condition. This is the most severe outcomes.
+ *
+ * MF_FAILED - The m-f() handler marks the page as PG_hwpoisoned'ed. It
+ * should have killed the process, but it can't isolate the page, due to
+ * conditions such as extra pin, unmap failure, etc. Accessing the page
+ * again will trigger another MCE and the process will be killed by the
+ * m-f() handler immediately.
+ *
+ * MF_DELAYED - The m-f() handler marks the page as PG_hwpoisoned'ed. The
+ * page is unmapped, but perhaps remains in LRU or file mapping. An attempt
Would the page remain in LRU or file mapping? IIUC, MF_DELAYED is returned from two functions:
1. me_swapcache_dirty. Page lives in swap cache and removed from LRU.
2. kvm_gmem_error_folio. Page range is unmapped. It seems page won't be in the LRU or page cache.
Or am I miss something?
Agreed, I'll fix the comment.
+ * to access the page again will trigger page fault and the PF handler
+ * will kill the process.
+ *
+ * MF_RECOVERED - The m-f() handler marks the page as PG_hwpoisoned'ed.
+ * The page has been completely isolated, that is, unmapped, taken out of
+ * the buddy system, or hole-punnched out of the file mapping.
+ */
  static const char *action_name[] = {
  	[MF_IGNORED] = "Ignored",
  	[MF_FAILED] = "Failed",
@@ -893,6 +915,7 @@ static const char * const action_page_types[] = {
  	[MF_MSG_DIFFERENT_COMPOUND]	= "different compound page after locking",
  	[MF_MSG_HUGE]			= "huge page",
  	[MF_MSG_FREE_HUGE]		= "free huge page",
+	[MF_MSG_GET_HWPOISON]		= "get hwpoison page",
  	[MF_MSG_UNMAP_FAILED]		= "unmapping failed page",
  	[MF_MSG_DIRTY_SWAPCACHE]	= "dirty swapcache page",
  	[MF_MSG_CLEAN_SWAPCACHE]	= "clean swapcache page",
@@ -906,6 +929,7 @@ static const char * const action_page_types[] = {
  	[MF_MSG_BUDDY]			= "free buddy page",
  	[MF_MSG_DAX]			= "dax page",
  	[MF_MSG_UNSPLIT_THP]		= "unsplit thp",
+	[MF_MSG_ALREADY_POISONED]	= "already poisoned",
  	[MF_MSG_UNKNOWN]		= "unknown page",
  };
@@ -1013,12 +1037,13 @@ static int me_kernel(struct page_state *ps, struct page *p) /*
   * Page in unknown state. Do nothing.
+ * This is a catch-all in case we fail to make sense of the page state.
   */
  static int me_unknown(struct page_state *ps, struct page *p)
  {
  	pr_err("%#lx: Unknown page state\n", page_to_pfn(p));
  	unlock_page(p);
-	return MF_FAILED;
+	return MF_IGNORED;
  }
/*
@@ -2055,6 +2080,8 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
  		if (flags & MF_ACTION_REQUIRED) {
  			folio = page_folio(p);
  			res = kill_accessing_process(current, folio_pfn(folio), flags);
+			action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
+			return res;
We might reuse the below "return res;"?
Yes, will fix.
  		}
  		return res;
Besides from the above possible nits, this patch looks good to me.
Acked-by: Miaohe Lin <linmiaohe@xxxxxxxxxx>
Thanks.
.

Thanks!

-jane






[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux