Hi Boris, >-----Original Message----- >From: Borislav Petkov [mailto:bp@xxxxxxxxx] >Sent: 31 March 2020 10:09 >To: Shiju Jose <shiju.jose@xxxxxxxxxx> >Cc: linux-acpi@xxxxxxxxxxxxxxx; linux-pci@xxxxxxxxxxxxxxx; linux- >kernel@xxxxxxxxxxxxxxx; rjw@xxxxxxxxxxxxx; helgaas@xxxxxxxxxx; >lenb@xxxxxxxxxx; james.morse@xxxxxxx; tony.luck@xxxxxxxxx; >gregkh@xxxxxxxxxxxxxxxxxxx; zhangliguang@xxxxxxxxxxxxxxxxx; >tglx@xxxxxxxxxxxxx; Linuxarm <linuxarm@xxxxxxxxxx>; Jonathan Cameron ><jonathan.cameron@xxxxxxxxxx>; tanxiaofei <tanxiaofei@xxxxxxxxxx>; >yangyicong <yangyicong@xxxxxxxxxx> >Subject: Re: [PATCH v6 1/2] ACPI / APEI: Add support to notify the vendor >specific HW errors > >On Mon, Mar 30, 2020 at 03:44:29PM +0000, Shiju Jose wrote: >> 1. rasdaemon need not to print the vendor error data reported by the >firmware if the >> kernel driver already print those information. In this case rasdaemon will >only need to store >> the decoded vendor error data to the SQL database. > >Well, there's a problem with this: > >rasdaemon printing != kernel driver printing > >Because printing in dmesg would need people to go grep dmesg. > >Printing through rasdaemon or any userspace agent, OTOH, is a lot more >flexible wrt analyzing and collecting those error records. Especially if you are a >data center admin and you want to collect all your error >records: grepping dmesg simply doesn't scale versus all the rasdaemon >agents reporting to a centrallized location. Ok. I posted V7 of this series. "[v7 PATCH 0/6] ACPI / APEI: Add support to notify non-fatal HW errors" > >> 2. If the vendor kernel driver want to report extra error information >through >> the vendor specific data (though presently we do not have any such use >case) for the rasdamon to log. >> I think the error handled status useful to indicate that the kernel driver >has filled the extra information and >> rasdaemon to decode and log them after extra data specific validity >check. > >The kernel driver can report that extra information without the kernel saying >that the error was handled. > >So I still see no sense for the kernel to tell userspace explicitly that it handled >the error. There might be a valid reason, though, of which I cannot think of >right now. Ok. > >Thx. > >-- >Regards/Gruss, > Boris. > >https://people.kernel.org/tglx/notes-about-netiquette Thanks, Shiju