Re: [PATCH] Add support of NVDIMM memory error notification in ACPI 6.2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jun 7, 2017 at 11:49 AM, Toshi Kani <toshi.kani@xxxxxxx> wrote:
> ACPI 6.2 defines a new ACPI notification value to NVDIMM Root Device
> in Table 5-169.
>
>  0x81 Unconsumed Uncorrectable Memory Error Detected
>       Used to pro-actively notify OSPM of uncorrectable memory errors
>       detected (for example a memory scrubbing engine that continuously
>       scans the NVDIMMs memory). This is an optional notification. Only
>       locations that were mapped in to SPA by the platform will generate
>       a notification.
>
> Add support of this notification value by initiating an ARS scan. This
> will find new error locations and add their badblocks information.
>
> Link: http://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf
> Signed-off-by: Toshi Kani <toshi.kani@xxxxxxx>
> Cc: Dan Williams <dan.j.williams@xxxxxxxxx>
> Cc: Rafael J. Wysocki <rjw@xxxxxxxxxxxxx>
> Cc: Vishal Verma <vishal.l.verma@xxxxxxxxx>
> ---
>  drivers/acpi/nfit/core.c |   28 ++++++++++++++++++++++------
>  drivers/acpi/nfit/nfit.h |    1 +
>  2 files changed, 23 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
> index 656acb5..cc22778 100644
> --- a/drivers/acpi/nfit/core.c
> +++ b/drivers/acpi/nfit/core.c
> @@ -2967,7 +2967,7 @@ static int acpi_nfit_remove(struct acpi_device *adev)
>         return 0;
>  }
>
> -void __acpi_nfit_notify(struct device *dev, acpi_handle handle, u32 event)
> +static void acpi_nfit_update_notify(struct device *dev, acpi_handle handle)
>  {
>         struct acpi_nfit_desc *acpi_desc = dev_get_drvdata(dev);
>         struct acpi_buffer buf = { ACPI_ALLOCATE_BUFFER, NULL };
> @@ -2975,11 +2975,6 @@ void __acpi_nfit_notify(struct device *dev, acpi_handle handle, u32 event)
>         acpi_status status;
>         int ret;
>
> -       dev_dbg(dev, "%s: event: %d\n", __func__, event);
> -
> -       if (event != NFIT_NOTIFY_UPDATE)
> -               return;
> -
>         if (!dev->driver) {
>                 /* dev->driver may be null if we're being removed */
>                 dev_dbg(dev, "%s: no driver found for dev\n", __func__);
> @@ -3016,6 +3011,27 @@ void __acpi_nfit_notify(struct device *dev, acpi_handle handle, u32 event)
>                 dev_err(dev, "Invalid _FIT\n");
>         kfree(buf.pointer);
>  }
> +
> +static void acpi_nfit_uc_error_notify(struct device *dev, acpi_handle handle)
> +{
> +       struct acpi_nfit_desc *acpi_desc = dev_get_drvdata(dev);
> +
> +       acpi_nfit_ars_rescan(acpi_desc);

I wonder if we should gate re-scanning with a similar:

    if (acpi_desc->scrub_mode == HW_ERROR_SCRUB_ON)

...check that we do in the mce notification case? Maybe not since we
don't get an indication of where the error is without a rescan.
However, at a minimum I think we need support for the new Start ARS
flag ("If set to 1 the firmware shall return data from a previous
scrub, if any, without starting a new scrub") and use that for this
case.

Another thing that seems to be missing in both this and the mce case
is a notification to userspace that something changed. We have calls
to sysfs_notify_dirent() to notify scrub completion events and DIMM
health status change events, I think we need a similar notifier
mechanism for new un-correctable errors.
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux