On Wed, Jun 7, 2017 at 11:49 AM, Toshi Kani <toshi.kani@xxxxxxx> wrote: > ACPI 6.2 defines a new ACPI notification value to NVDIMM Root Device > in Table 5-169. > > 0x81 Unconsumed Uncorrectable Memory Error Detected > Used to pro-actively notify OSPM of uncorrectable memory errors > detected (for example a memory scrubbing engine that continuously > scans the NVDIMMs memory). This is an optional notification. Only > locations that were mapped in to SPA by the platform will generate > a notification. > > Add support of this notification value by initiating an ARS scan. This > will find new error locations and add their badblocks information. > > Link: http://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf > Signed-off-by: Toshi Kani <toshi.kani@xxxxxxx> > Cc: Dan Williams <dan.j.williams@xxxxxxxxx> > Cc: Rafael J. Wysocki <rjw@xxxxxxxxxxxxx> > Cc: Vishal Verma <vishal.l.verma@xxxxxxxxx> > --- > drivers/acpi/nfit/core.c | 28 ++++++++++++++++++++++------ > drivers/acpi/nfit/nfit.h | 1 + > 2 files changed, 23 insertions(+), 6 deletions(-) > > diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c > index 656acb5..cc22778 100644 > --- a/drivers/acpi/nfit/core.c > +++ b/drivers/acpi/nfit/core.c > @@ -2967,7 +2967,7 @@ static int acpi_nfit_remove(struct acpi_device *adev) > return 0; > } > > -void __acpi_nfit_notify(struct device *dev, acpi_handle handle, u32 event) > +static void acpi_nfit_update_notify(struct device *dev, acpi_handle handle) > { > struct acpi_nfit_desc *acpi_desc = dev_get_drvdata(dev); > struct acpi_buffer buf = { ACPI_ALLOCATE_BUFFER, NULL }; > @@ -2975,11 +2975,6 @@ void __acpi_nfit_notify(struct device *dev, acpi_handle handle, u32 event) > acpi_status status; > int ret; > > - dev_dbg(dev, "%s: event: %d\n", __func__, event); > - > - if (event != NFIT_NOTIFY_UPDATE) > - return; > - > if (!dev->driver) { > /* dev->driver may be null if we're being removed */ > dev_dbg(dev, "%s: no driver found for dev\n", __func__); > @@ -3016,6 +3011,27 @@ void __acpi_nfit_notify(struct device *dev, acpi_handle handle, u32 event) > dev_err(dev, "Invalid _FIT\n"); > kfree(buf.pointer); > } > + > +static void acpi_nfit_uc_error_notify(struct device *dev, acpi_handle handle) > +{ > + struct acpi_nfit_desc *acpi_desc = dev_get_drvdata(dev); > + > + acpi_nfit_ars_rescan(acpi_desc); I wonder if we should gate re-scanning with a similar: if (acpi_desc->scrub_mode == HW_ERROR_SCRUB_ON) ...check that we do in the mce notification case? Maybe not since we don't get an indication of where the error is without a rescan. However, at a minimum I think we need support for the new Start ARS flag ("If set to 1 the firmware shall return data from a previous scrub, if any, without starting a new scrub") and use that for this case. Another thing that seems to be missing in both this and the mce case is a notification to userspace that something changed. We have calls to sysfs_notify_dirent() to notify scrub completion events and DIMM health status change events, I think we need a similar notifier mechanism for new un-correctable errors. -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html