Re: [PATCH v2 1/2] ACPI, APEI, GHES: Remove strict check for memory error handling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Nov 26, 2013 at 04:31:36AM -0500, Chen, Gong wrote:
> Date:	Tue, 26 Nov 2013 04:31:36 -0500
> From: "Chen, Gong" <gong.chen@xxxxxxxxxxxxxxx>
> To: "Naveen N. Rao" <naveen.n.rao@xxxxxxxxxxxxxxxxxx>
> Cc: tony.luck@xxxxxxxxx, bp@xxxxxxxxx, linux-acpi@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH v2 1/2] ACPI, APEI, GHES: Remove strict check for
>  memory error handling
> User-Agent: Mutt/1.5.21 (2010-09-15)
> 
> On Tue, Nov 26, 2013 at 02:32:53PM +0530, Naveen N. Rao wrote:
> > Date: Tue, 26 Nov 2013 14:32:53 +0530
> > From: "Naveen N. Rao" <naveen.n.rao@xxxxxxxxxxxxxxxxxx>
> > To: "Chen, Gong" <gong.chen@xxxxxxxxxxxxxxx>, tony.luck@xxxxxxxxx,
> >  bp@xxxxxxxxx
> > CC: linux-acpi@xxxxxxxxxxxxxxx
> > Subject: Re: [PATCH v2 1/2] ACPI, APEI, GHES: Remove strict check for
> >  memory error handling
> > User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101
> >  Thunderbird/24.1.0
> > 
> > On 11/25/2013 12:45 PM, Chen, Gong wrote:
> > >Usually SCI is employed to handle corrected error, especially
> > >for memory corrected error but in fact SCI still can be used
> > >to handle any error like memory uncorrected error even fatal
> > >error if BIOS enable it. For this kind of situation, it
> > >should be logged, too.
> > >
> > >v2 -> v1: make the event record more precisely
> > >
> > >Signed-off-by: Chen, Gong <gong.chen@xxxxxxxxxxxxxxx>
> > >---
> > >  arch/x86/kernel/cpu/mcheck/mce-apei.c | 10 +++++++---
> > >  drivers/acpi/apei/ghes.c              |  3 +--
> > >  2 files changed, 8 insertions(+), 5 deletions(-)
> > >
> > >diff --git a/arch/x86/kernel/cpu/mcheck/mce-apei.c b/arch/x86/kernel/cpu/mcheck/mce-apei.c
> > >index de8b60a..d137ab8 100644
> > >--- a/arch/x86/kernel/cpu/mcheck/mce-apei.c
> > >+++ b/arch/x86/kernel/cpu/mcheck/mce-apei.c
> > >@@ -33,6 +33,7 @@
> > >  #include <linux/acpi.h>
> > >  #include <linux/cper.h>
> > >  #include <acpi/apei.h>
> > >+#include <acpi/ghes.h>
> > >  #include <asm/mce.h>
> > >
> > >  #include "mce-internal.h"
> > >@@ -41,14 +42,17 @@ void apei_mce_report_mem_error(int corrected, struct cper_sec_mem_err *mem_err)
> > >  {
> > >  	struct mce m;
> > >
> > >-	/* Only corrected MC is reported */
> > >-	if (!corrected || !(mem_err->validation_bits & CPER_MEM_VALID_PA))
> > >+	if (!(mem_err->validation_bits & CPER_MEM_VALID_PA))
> > >  		return;
> > >
> > >  	mce_setup(&m);
> > >  	m.bank = 1;
> > >-	/* Fake a memory read corrected error with unknown channel */
> > >+	/* Fake a memory read error with unknown channel */
> > >  	m.status = MCI_STATUS_VAL | MCI_STATUS_EN | MCI_STATUS_ADDRV | 0x9f;
> > >+	if (corrected >= GHES_SEV_RECOVERABLE)
> > >+		m.status |= MCI_STATUS_UC;
> > >+	if (corrected >= GHES_SEV_PANIC)
> > >+		m.status |= MCI_STATUS_PCC;
> > 
> > Hmm... so you only fill up the most basic information from the cper
> > record. In the absence of 'S', 'AR' bits, I am not sure how useful
> > this is - except for logging the error through /dev/mcelog for
> > legacy users. If that is the intent, you have my
> > 
> > Acked-by: Naveen N. Rao <naveen.n.rao@xxxxxxxxxxxxxxxxxx>
> > 
> > 
> > - Naveen
> > 
> 
> Thanks for your ACK. We want to record more information but you know
> UEFI/CPER is not related to MCE in essentially. So we can't figure
> out all necessary information to construct MCE record. IOW, we can
> just apply the most valuable information like physical address and
> fake other fields. From this point of view, this kind of H/W error
> event report method is still not perfect.

Hi, Boris

Will you pick up this patch in your RAS request pull?

Attachment: signature.asc
Description: Digital signature


[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux