Patch "x86/mce: Mark fatal MCE's page as poison to avoid panic in the kdump kernel" has been added to the 6.6-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    x86/mce: Mark fatal MCE's page as poison to avoid panic in the kdump kernel

to the 6.6-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     x86-mce-mark-fatal-mce-s-page-as-poison-to-avoid-pan.patch
and it can be found in the queue-6.6 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 28ecc6b653f2de33d74a7922d8ed7cdc696c5d85
Author: Zhiquan Li <zhiquan1.li@xxxxxxxxx>
Date:   Thu Oct 26 08:39:03 2023 +0800

    x86/mce: Mark fatal MCE's page as poison to avoid panic in the kdump kernel
    
    [ Upstream commit 9f3b130048bfa2e44a8cfb1b616f826d9d5d8188 ]
    
    Memory errors don't happen very often, especially fatal ones. However,
    in large-scale scenarios such as data centers, that probability
    increases with the amount of machines present.
    
    When a fatal machine check happens, mce_panic() is called based on the
    severity grading of that error. The page containing the error is not
    marked as poison.
    
    However, when kexec is enabled, tools like makedumpfile understand when
    pages are marked as poison and do not touch them so as not to cause
    a fatal machine check exception again while dumping the previous
    kernel's memory.
    
    Therefore, mark the page containing the error as poisoned so that the
    kexec'ed kernel can avoid accessing the page.
    
      [ bp: Rewrite commit message and comment. ]
    
    Co-developed-by: Youquan Song <youquan.song@xxxxxxxxx>
    Signed-off-by: Youquan Song <youquan.song@xxxxxxxxx>
    Signed-off-by: Zhiquan Li <zhiquan1.li@xxxxxxxxx>
    Signed-off-by: Borislav Petkov (AMD) <bp@xxxxxxxxx>
    Reviewed-by: Naoya Horiguchi <naoya.horiguchi@xxxxxxx>
    Link: https://lore.kernel.org/r/20231014051754.3759099-1-zhiquan1.li@xxxxxxxxx
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 6f35f724cc14..20ab11aec60b 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -44,6 +44,7 @@
 #include <linux/sync_core.h>
 #include <linux/task_work.h>
 #include <linux/hardirq.h>
+#include <linux/kexec.h>
 
 #include <asm/intel-family.h>
 #include <asm/processor.h>
@@ -233,6 +234,7 @@ static noinstr void mce_panic(const char *msg, struct mce *final, char *exp)
 	struct llist_node *pending;
 	struct mce_evt_llist *l;
 	int apei_err = 0;
+	struct page *p;
 
 	/*
 	 * Allow instrumentation around external facilities usage. Not that it
@@ -286,6 +288,20 @@ static noinstr void mce_panic(const char *msg, struct mce *final, char *exp)
 	if (!fake_panic) {
 		if (panic_timeout == 0)
 			panic_timeout = mca_cfg.panic_timeout;
+
+		/*
+		 * Kdump skips the poisoned page in order to avoid
+		 * touching the error bits again. Poison the page even
+		 * if the error is fatal and the machine is about to
+		 * panic.
+		 */
+		if (kexec_crash_loaded()) {
+			if (final && (final->status & MCI_STATUS_ADDRV)) {
+				p = pfn_to_online_page(final->addr >> PAGE_SHIFT);
+				if (p)
+					SetPageHWPoison(p);
+			}
+		}
 		panic(msg);
 	} else
 		pr_emerg(HW_ERR "Fake kernel panic: %s\n", msg);




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux