Re: [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 14 Sep 2024 08:13:21 +0200
Mauro Carvalho Chehab <mchehab+huawei@xxxxxxxxxx> wrote:

> This series add support for injecting generic CPER records.  Such records
> are generated outside QEMU via a provided script.
> 
> On this  version,  the patch reworking the way offsets are calculated were
> split on several other patches, to make one logical change per patch and
> make review easier.
> 
> Despite the number of patches increased from 12 to 21, there is just one
> real new patch (as the other ones are a split from a big change):
> 
>   acpi/generic_event_device: Update GHES migration to cover hest addr

I'm done with this round of review.

Given that the series accumulated a bunch of cleanups,
I'd suggest to move all cleanups/renamings not related
to new HEST lookup and new src id mapping to the beginning
of the series, so once they reviewed they could be split up into
a separate series that could be merged while we are ironing down
the new functionality. 
 

> ---
> 
> v10:
> - Patch 1 split on several patches to make reviews easier;
> - Added a migration patch;
> - CPER QMP command was renamed;
> - Updated some comments to better reflect exact ACPI version;
> - Removed a code to reset acks when OSPM fails to read records;
> - Removed a duplicated config GHES_CPER symbol;
> - There is  now an arch-independent namespace for GHES source IDs;
> - Fixed the size of hest_ghes_notify array when creating tables;
> - acpi-hest.json is now a section of ACPI;
> - QMP command renamed from @ghes-cper to inject-ghes-error.
> 
> v9:
> - Patches reorganized to make easier for reviewers;
> - source ID is now guest-OS specific;
> - Some patches got a revision history since v8;
> - Several minor cleanups.
> 
> v8:
> - Fix one of the BIOS links that were incorrect;
> - Changed mem error internal injection to use a common code;
> - No more hardcoded values for CPER: instead of using just the
>   payload at the QAPI, it now has the full raw CPER there;
> - Error injection script now supports changing fields at the
>   Generic Error Data section of the CPER;
> - Several minor cleanups.
> 
> v7:
> - Change the way offsets are calculated and used on HEST table.
>   Now, it is compatible with migrations as all offsets are relative
>   to the HEST table;
> - GHES interface is now more generic: the entire CPER is sent via
>   QMP, instead of just the payload;
> - Some code cleanups to make the code more robust;
> - The python script now uses QEMUMonitorProtocol class.
> 
> v6:
> - PNP0C33 device creation moved to aml-build.c;
> - acpi_ghes record functions now use ACPI notify parameter,
>   instead of source ID;
> - the number of source IDs is now automatically calculated;
> - some code cleanups and function/var renames;
> - some fixes and cleanups at the error injection script;
> - ghes cper stub now produces an error if cper JSON is not compiled;
> - Offset calculation logic for GHES was refactored;
> - Updated documentation to reflect the GHES allocated size;
> - Added a x-mpidr object for QOM usage;
> - Added a patch making usage of x-mpidr field at ARM injection
>   script;
> 
> v5:
> - CPER guid is now passing as string;
> - raw-data is now passed with base64 encode;
> - Removed several GPIO left-overs from arm/virt.c changes;
> - Lots of cleanups and improvements at the error injection script.
>   It now better handles QMP dialog and doesn't print debug messages.
>   Also, code was split on two modules, to make easier to add more
>   error injection commands.
> 
> v4:
> - CPER generation moved to happen outside QEMU;
> - One patch adding support for mpidr query was removed.
> 
> v3:
> - patch 1 cleanups with some comment changes and adding another place where
>   the poweroff GPIO define should be used. No changes on other patches (except
>   due to conflict resolution).
> 
> v2:
> - added a new patch using a define for GPIO power pin;
> - patch 2 changed to also use a define for generic error GPIO pin;
> - a couple cleanups at patch 2 removing uneeded else clauses.
> 
> Example of generating a CPER record:
> 
> $ scripts/ghes_inject.py -d arm -p 0xdeadbeef
> GUID: e19e3d16-bc11-11e4-9caa-c2051d5d46b0
> Generic Error Status Block (20 bytes):
>       00000000  01 00 00 00 00 00 00 00 00 00 00 00 90 00 00 00   ................
>       00000010  00 00 00 00                                       ....
> 
> Generic Error Data Entry (72 bytes):
>       00000000  16 3d 9e e1 11 bc e4 11 9c aa c2 05 1d 5d 46 b0   .=...........]F.
>       00000010  00 00 00 00 00 03 00 00 48 00 00 00 00 00 00 00   ........H.......
>       00000020  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
>       00000030  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
>       00000040  00 00 00 00 00 00 00 00                           ........
> 
> Payload (72 bytes):
>       00000000  05 00 00 00 01 00 00 00 48 00 00 00 00 00 00 00   ........H.......
>       00000010  00 00 00 80 00 00 00 00 10 05 0f 00 00 00 00 00   ................
>       00000020  00 00 00 00 00 00 00 00 00 20 14 00 02 01 00 03   ......... ......
>       00000030  0f 00 91 00 00 00 00 00 ef be ad de 00 00 00 00   ................
>       00000040  ef be ad de 00 00 00 00                           ........
> 
> Error injected.
> 
> [    9.358364] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
> [    9.359027] {1}[Hardware Error]: event severity: recoverable
> [    9.359586] {1}[Hardware Error]:  Error 0, type: recoverable
> [    9.360124] {1}[Hardware Error]:   section_type: ARM processor error
> [    9.360561] {1}[Hardware Error]:   MIDR: 0x00000000000f0510
> [    9.361160] {1}[Hardware Error]:   Multiprocessor Affinity Register (MPIDR): 0x0000000080000000
> [    9.361643] {1}[Hardware Error]:   running state: 0x0
> [    9.362142] {1}[Hardware Error]:   Power State Coordination Interface state: 0
> [    9.362682] {1}[Hardware Error]:   Error info structure 0:
> [    9.363030] {1}[Hardware Error]:   num errors: 2
> [    9.363656] {1}[Hardware Error]:    error_type: 0x02: cache error
> [    9.364163] {1}[Hardware Error]:    error_info: 0x000000000091000f
> [    9.364834] {1}[Hardware Error]:     transaction type: Data Access
> [    9.365599] {1}[Hardware Error]:     cache error, operation type: Data write
> [    9.366441] {1}[Hardware Error]:     cache level: 2
> [    9.367005] {1}[Hardware Error]:     processor context not corrupted
> [    9.367753] {1}[Hardware Error]:    physical fault address: 0x00000000deadbeef
> [    9.374267] Memory failure: 0xdeadb: recovery action for free buddy page: Recovered
> 
> Such script currently supports arm processor error CPER, but can easily be
> extended to other GHES notification types.
> 
> 
> Mauro Carvalho Chehab (21):
>   acpi/ghes: add a firmware file with HEST address
>   acpi/generic_event_device: Update GHES migration to cover hest addr
>   acpi/ghes: get rid of ACPI_HEST_SRC_ID_RESERVED
>   acpi/ghes: simplify acpi_ghes_record_errors() code
>   acpi/ghes: better handle source_id and notification
>   acpi/ghes: Remove a duplicated out of bounds check
>   acpi/ghes: rework the logic to handle HEST source ID
>   acpi/ghes: Change the type for source_id
>   acpi/ghes: Don't hardcode the number of sources on ghes
>   acpi/ghes: make the GHES record generation more generic
>   acpi/ghes: don't crash QEMU if ghes GED is not found
>   acpi/ghes: rename etc/hardware_error file macros
>   acpi/ghes: better name GHES memory error function
>   acpi/ghes: add a notifier to notify when error data is ready
>   acpi/generic_event_device: add an APEI error device
>   arm/virt: Wire up a GED error device for ACPI / GHES
>   qapi/acpi-hest: add an interface to do generic CPER error injection
>   docs: acpi_hest_ghes: fix documentation for CPER size
>   scripts/ghes_inject: add a script to generate GHES error inject
>   target/arm: add an experimental mpidr arm cpu property object
>   scripts/arm_processor_error.py: retrieve mpidr if not filled
> 
>  MAINTAINERS                            |  10 +
>  docs/specs/acpi_hest_ghes.rst          |   6 +-
>  hw/acpi/Kconfig                        |   5 +
>  hw/acpi/aml-build.c                    |  10 +
>  hw/acpi/generic_event_device.c         |  19 +-
>  hw/acpi/ghes-stub.c                    |   2 +-
>  hw/acpi/ghes.c                         | 312 +++++++----
>  hw/acpi/ghes_cper.c                    |  32 ++
>  hw/acpi/ghes_cper_stub.c               |  19 +
>  hw/acpi/meson.build                    |   2 +
>  hw/arm/virt-acpi-build.c               |  12 +-
>  hw/arm/virt.c                          |  19 +-
>  include/hw/acpi/acpi_dev_interface.h   |   1 +
>  include/hw/acpi/aml-build.h            |   2 +
>  include/hw/acpi/generic_event_device.h |   1 +
>  include/hw/acpi/ghes.h                 |  37 +-
>  include/hw/arm/virt.h                  |   2 +
>  qapi/acpi-hest.json                    |  35 ++
>  qapi/meson.build                       |   1 +
>  qapi/qapi-schema.json                  |   1 +
>  scripts/arm_processor_error.py         | 388 ++++++++++++++
>  scripts/ghes_inject.py                 |  51 ++
>  scripts/qmp_helper.py                  | 702 +++++++++++++++++++++++++
>  target/arm/cpu.c                       |   1 +
>  target/arm/cpu.h                       |   1 +
>  target/arm/helper.c                    |  10 +-
>  target/arm/kvm.c                       |   3 +-
>  27 files changed, 1552 insertions(+), 132 deletions(-)
>  create mode 100644 hw/acpi/ghes_cper.c
>  create mode 100644 hw/acpi/ghes_cper_stub.c
>  create mode 100644 qapi/acpi-hest.json
>  create mode 100644 scripts/arm_processor_error.py
>  create mode 100755 scripts/ghes_inject.py
>  create mode 100644 scripts/qmp_helper.py
> 





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux