Hi Tyler, We have tested V6 patch set on our platform. It worked fine. Thanks, Shiju > -----Original Message----- > From: Tyler Baicar [mailto:tbaicar@xxxxxxxxxxxxxx] > Sent: 07 December 2016 21:48 > To: christoffer.dall@xxxxxxxxxx; marc.zyngier@xxxxxxx; > pbonzini@xxxxxxxxxx; rkrcmar@xxxxxxxxxx; linux@xxxxxxxxxxxxxxx; > catalin.marinas@xxxxxxx; will.deacon@xxxxxxx; rjw@xxxxxxxxxxxxx; > lenb@xxxxxxxxxx; matt@xxxxxxxxxxxxxxxxxxx; robert.moore@xxxxxxxxx; > lv.zheng@xxxxxxxxx; nkaje@xxxxxxxxxxxxxx; zjzhang@xxxxxxxxxxxxxx; > mark.rutland@xxxxxxx; james.morse@xxxxxxx; akpm@xxxxxxxxxxxxxxxxxxxx; > eun.taik.lee@xxxxxxxxxxx; sandeepa.s.prabhu@xxxxxxxxx; > labbott@xxxxxxxxxx; shijie.huang@xxxxxxx; rruigrok@xxxxxxxxxxxxxx; > paul.gortmaker@xxxxxxxxxxxxx; tn@xxxxxxxxxxxx; fu.wei@xxxxxxxxxx; > rostedt@xxxxxxxxxxx; bristot@xxxxxxxxxx; linux-arm- > kernel@xxxxxxxxxxxxxxxxxxx; kvmarm@xxxxxxxxxxxxxxxxxxxxx; > kvm@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; linux- > acpi@xxxxxxxxxxxxxxx; linux-efi@xxxxxxxxxxxxxxx; devel@xxxxxxxxxx; > Suzuki.Poulose@xxxxxxx; punit.agrawal@xxxxxxx; astone@xxxxxxxxxx; > harba@xxxxxxxxxxxxxx; hanjun.guo@xxxxxxxxxx; John Garry; Shiju Jose > Cc: Tyler Baicar > Subject: [PATCH V6 00/10] Add UEFI 2.6 and ACPI 6.1 updates for RAS on > ARM64 > > When a memory error, CPU error, PCIe error, or other type of hardware > error that's covered by RAS occurs, firmware should populate the shared > GHES memory location with the proper GHES structures to notify the OS > of the error. > For example, platforms that implement firmware first handling may > implement separate GHES sources for corrected errors and uncorrected > errors. If the error is an uncorrectable error, then the firmware will > notify the OS immediately since the error needs to be handled ASAP. The > OS will then be able to take the appropriate action needed such as > offlining a page. If the error is a corrected error, then the firmware > will not interrupt the OS immediately. > Instead, the OS will see and report the error the next time it's GHES > timer expires. The kernel will first parse the GHES structures and > report the errors through the kernel logs and then notify the user > space through RAS trace events. This allows user space applications > such as RAS Daemon to see the errors and report them however the user > desires. This patchset extends the kernel functionality for RAS errors > based on updates in the UEFI 2.6 and ACPI 6.1 specifications. > > An example flow from firmware to user space could be: > > +---------------+ > +-------->| | > | | GHES polling |--+ > +-------------+ | source | | +---------------+ +---------- > --+ > | | +---------------+ | | Kernel GHES | | > | > | Firmware | +-->| CPER AER and |-->| RAS > trace | > | | +---------------+ | | EDAC drivers | | event > | > +-------------+ | | | +---------------+ +---------- > --+ > | | GHES sci |--+ > +-------->| source | > +---------------+ > > Add support for Generic Hardware Error Source (GHES) v2, which > introduces the capability for the OS to acknowledge the consumption of > the error record generated by the Reliability, Availability and > Serviceability (RAS) controller. > This eliminates potential race conditions between the OS and the RAS > controller. > > Add support for the timestamp field added to the Generic Error Data > Entry v3, allowing the OS to log the time that the error is generated > by the firmware, rather than the time the error is consumed. This > improves the correctness of event sequences when analyzing error logs. > The timestamp is added in ACPI 6.1, reference Table 18-343 Generic > Error Data Entry. > > Add support for ARMv8 Common Platform Error Record (CPER) per UEFI 2.6 > specification. ARMv8 specific processor error information is reported > as part of the CPER records. This provides more detail on for > processor error logs. This can help describe ARMv8 cache, tlb, and bus > errors. > > Synchronous External Abort (SEA) represents a specific processor error > condition in ARM systems. A handler is added to recognize SEA errors, > and a notifier is added to parse and report the errors before the > process is killed. Refer to section N.2.1.1 in the Common Platform > Error Record appendix of the UEFI 2.6 specification. > > Currently the kernel ignores CPER records that are unrecognized. > On the other hand, UEFI spec allows for non-standard (eg. vendor > proprietary) error section type in CPER (Common Platform Error Record), > as defined in section N2.3 of UEFI version 2.5. Therefore, user is not > able to see hardware error data of non-standard section. > > If section Type field of Generic Error Data Entry is unrecognized, > prints out the raw data in dmesg buffer, and also adds a tracepoint for > reporting such hardware errors. > > Currently even if an error status block's severity is fatal, the kernel > does not honor the severity level and panic. With the firmware first > model, the platform could inform the OS about a fatal hardware error > through the non-NMI GHES notification type. The OS should panic when a > hardware error record is received with this severity. > > Add support to handle SEAs that occur while a KVM guest kernel is > running. Currently these are unsupported by the guest abort handling. > > Depends on: [PATCH v15] acpi, apei, arm64: APEI initial support for > aarch64. > https://lkml.org/lkml/2016/12/1/312 > > V6: Change HEST_TYPE_GENERIC_V2 to IS_HEST_TYPE_GENERIC_V2 for > readability > Move APEI helper defines from cper.h to ghes.h > Add data_len decrement back into print loop > Change references to ARMv8 to just ARM > Rewrite ARM processor context info parsing > Check valid bit of ARM error info field before printing it > Add include of linux/uuid.h in ghes.c > > V5: Fix GHES goto logic for error conditions > Change ghes_do_read_ack to ghes_ack_error > Make sure data version check is >= 3 > Use CPER helper functions in print functions > Make handle_guest_sea() dummy function static for arm > Add arm to subject line for KVM patch > > V4: Add bit offset left shift to read_ack_write value > Make HEST generic and generic_v2 structures a union in the ghes > structure > Move gdata v3 helper functions into ghes.h to avoid duplication > Reorder the timestamp print and avoid memcpy > Add helper functions for gdata size checking > Rename the SEA functions > Add helper function for GHES panics > Set fru_id to NULL UUID at variable declaration > Limit ARM trace event parameters to the needed structures > Reorder the ARM trace event variables to save space > Add comment for why we don't pass SEAs to the guest when it aborts > Move ARM trace event call into GHES driver instead of CPER > > V3: Fix unmapped address to the read_ack_register in ghes.c > Add helper function to get the proper payload based on generic data > entry > version > Move timestamp print to avoid changing function calls in cper.c > Remove patch "arm64: exception: handle instruction abort at current > EL" > since the el1_ia handler is already added in 4.8 > Add EFI and ARM64 dependencies for HAVE_ACPI_APEI_SEA > Add a new trace event for ARM type errors > Add support to handle KVM guest SEAs > > V2: Add PSCI state print for the ARMv8 error type. > Separate timestamp year into year and century using BCD format. > Rebase on top of ACPICA 20160318 release and remove header file > changes > in include/acpi/actbl1.h. > Add panic OS with fatal error status block patch. > Add processing of unrecognized CPER error section patches with > updates > from previous comments. Original patches: > https://lkml.org/lkml/2015/9/8/646 > > V1: https://lkml.org/lkml/2016/2/5/544 > > Jonathan (Zhixiong) Zhang (1): > acpi: apei: panic OS with fatal error status block > > Tyler Baicar (9): > acpi: apei: read ack upon ghes record consumption > ras: acpi/apei: cper: generic error data entry v3 per ACPI 6.1 > efi: parse ARM processor error > arm64: exception: handle Synchronous External Abort > acpi: apei: handle SEA notification type for ARMv8 > efi: print unrecognized CPER section > ras: acpi / apei: generate trace event for unrecognized CPER section > trace, ras: add ARM processor error trace event > arm/arm64: KVM: add guest SEA support > > arch/arm/include/asm/kvm_arm.h | 1 + > arch/arm/include/asm/system_misc.h | 5 + > arch/arm/kvm/mmu.c | 18 +++- > arch/arm64/Kconfig | 1 + > arch/arm64/include/asm/kvm_arm.h | 1 + > arch/arm64/include/asm/system_misc.h | 15 +++ > arch/arm64/mm/fault.c | 71 ++++++++++-- > drivers/acpi/apei/Kconfig | 14 +++ > drivers/acpi/apei/ghes.c | 189 > +++++++++++++++++++++++++++++--- > drivers/acpi/apei/hest.c | 7 +- > drivers/firmware/efi/cper.c | 204 > ++++++++++++++++++++++++++++++++--- > drivers/ras/ras.c | 2 + > include/acpi/ghes.h | 27 ++++- > include/linux/cper.h | 53 +++++++++ > include/ras/ras_event.h | 100 +++++++++++++++++ > 15 files changed, 664 insertions(+), 44 deletions(-) > > -- > Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm > Technologies, Inc. > Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a > Linux Foundation Collaborative Project. _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm