Re: [PATCH v3] ACPI: APEI: Skip initialization of GHES_ASSIST structures for Machine Check Architecture

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 23, 2024 at 03:39:49PM -0600, Naik, Avadhut wrote:
> Hi,
> 
> Any further comments on this patch?

No. I like the comments you added to address my earlier
confusion/concerns.

Reviewed-by: Tony Luck <tony.luck@xxxxxxxxx>

-Tony

> 
> On 12/18/2023 11:13 AM, Avadhut Naik wrote:
> > Hi,
> > 
> > Any further feedback on this patch?
> > 
> > On 12/4/2023 13:25, Avadhut Naik wrote:
> >> To support GHES_ASSIST on Machine Check Architecture (MCA) error sources,
> >> a set of GHES structures is provided by the system firmware for each MCA
> >> error source. Each of these sets consists of a GHES structure for each MCA
> >> bank on each logical CPU, with all structures of a set sharing a common
> >> Related Source ID, equal to the Source ID of one of the MCA error source
> >> structures.[1] On SOCs with large core counts, this typically equates to
> >> tens of thousands of GHES_ASSIST structures for MCA under
> >> "/sys/bus/platform/drivers/GHES".
> >>
> >> Support for GHES_ASSIST however, hasn't been implemented in the kernel. As
> >> such, the information provided through these structures is not consumed by
> >> Linux. Moreover, these GHES_ASSIST structures for MCA, which are supposed
> >> to provide supplemental information in context of an error reported by
> >> hardware, are setup as independent error sources by the kernel during HEST
> >> initialization.
> >>
> >> Additionally, if the Type field of the Notification structure, associated
> >> with these GHES_ASSIST structures for MCA, is set to Polled, the kernel
> >> sets up a timer for each individual structure. The duration of the timer
> >> is derived from the Poll Interval field of the Notification structure. On
> >> SOCs with high core counts, this will result in tens of thousands of
> >> timers expiring periodically causing unnecessary preemptions and wastage
> >> of CPU cycles. The problem will particularly intensify if Poll Interval
> >> duration is not sufficiently high.
> >>
> >> Since GHES_ASSIST support is not present in kernel, skip initialization
> >> of GHES_ASSIST structures for MCA to eliminate their performance impact.
> >>
> >> [1] ACPI specification 6.5, section 18.7
> >>
> >> Signed-off-by: Avadhut Naik <avadhut.naik@xxxxxxx>
> >> Reviewed-by: Yazen Ghannam <yazen.ghannam@xxxxxxx>
> >> ---
> >> Changes in v2:
> >> 1.	Since is_ghes_assist_struct() returns if any of the conditions is hit
> >> if-else-if chain is redundant. Replace it with just if statements.
> >> 2.	Fix formatting errors.
> >> 3.	Add Reviewed-by: Yazen Ghannam <yazen.ghannam@xxxxxxx>
> >>
> >> Changes in v3:
> >> 1. Modify structure (mces) comment, per Tony's recommendation, to better
> >> reflect the structure's usage.
> >> ---
> >>  drivers/acpi/apei/hest.c | 51 ++++++++++++++++++++++++++++++++++++++++
> >>  1 file changed, 51 insertions(+)
> >>
> >> diff --git a/drivers/acpi/apei/hest.c b/drivers/acpi/apei/hest.c
> >> index 6aef1ee5e1bd..20d757687e3d 100644
> >> --- a/drivers/acpi/apei/hest.c
> >> +++ b/drivers/acpi/apei/hest.c
> >> @@ -37,6 +37,20 @@ EXPORT_SYMBOL_GPL(hest_disable);
> >>  
> >>  static struct acpi_table_hest *__read_mostly hest_tab;
> >>  
> >> +/*
> >> + * Since GHES_ASSIST is not supported, skip initialization of GHES_ASSIST
> >> + * structures for MCA.
> >> + * During HEST parsing, detected MCA error sources are cached from early
> >> + * table entries so that the Flags and Source Id fields from these cached
> >> + * values are then referred to in later table entries to determine if the
> >> + * encountered GHES_ASSIST structure should be initialized.
> >> + */
> >> +static struct {
> >> +	struct acpi_hest_ia_corrected *cmc;
> >> +	struct acpi_hest_ia_machine_check *mc;
> >> +	struct acpi_hest_ia_deferred_check *dmc;
> >> +} mces;
> >> +
> >>  static const int hest_esrc_len_tab[ACPI_HEST_TYPE_RESERVED] = {
> >>  	[ACPI_HEST_TYPE_IA32_CHECK] = -1,	/* need further calculation */
> >>  	[ACPI_HEST_TYPE_IA32_CORRECTED_CHECK] = -1,
> >> @@ -70,22 +84,54 @@ static int hest_esrc_len(struct acpi_hest_header *hest_hdr)
> >>  		cmc = (struct acpi_hest_ia_corrected *)hest_hdr;
> >>  		len = sizeof(*cmc) + cmc->num_hardware_banks *
> >>  			sizeof(struct acpi_hest_ia_error_bank);
> >> +		mces.cmc = cmc;
> >>  	} else if (hest_type == ACPI_HEST_TYPE_IA32_CHECK) {
> >>  		struct acpi_hest_ia_machine_check *mc;
> >>  		mc = (struct acpi_hest_ia_machine_check *)hest_hdr;
> >>  		len = sizeof(*mc) + mc->num_hardware_banks *
> >>  			sizeof(struct acpi_hest_ia_error_bank);
> >> +		mces.mc = mc;
> >>  	} else if (hest_type == ACPI_HEST_TYPE_IA32_DEFERRED_CHECK) {
> >>  		struct acpi_hest_ia_deferred_check *mc;
> >>  		mc = (struct acpi_hest_ia_deferred_check *)hest_hdr;
> >>  		len = sizeof(*mc) + mc->num_hardware_banks *
> >>  			sizeof(struct acpi_hest_ia_error_bank);
> >> +		mces.dmc = mc;
> >>  	}
> >>  	BUG_ON(len == -1);
> >>  
> >>  	return len;
> >>  };
> >>  
> >> +/*
> >> + * GHES and GHESv2 structures share the same format, starting from
> >> + * Source Id and ending in Error Status Block Length (inclusive).
> >> + */
> >> +static bool is_ghes_assist_struct(struct acpi_hest_header *hest_hdr)
> >> +{
> >> +	struct acpi_hest_generic *ghes;
> >> +	u16 related_source_id;
> >> +
> >> +	if (hest_hdr->type != ACPI_HEST_TYPE_GENERIC_ERROR &&
> >> +	    hest_hdr->type != ACPI_HEST_TYPE_GENERIC_ERROR_V2)
> >> +		return false;
> >> +
> >> +	ghes = (struct acpi_hest_generic *)hest_hdr;
> >> +	related_source_id = ghes->related_source_id;
> >> +
> >> +	if (mces.cmc && mces.cmc->flags & ACPI_HEST_GHES_ASSIST &&
> >> +	    related_source_id == mces.cmc->header.source_id)
> >> +		return true;
> >> +	if (mces.mc && mces.mc->flags & ACPI_HEST_GHES_ASSIST &&
> >> +	    related_source_id == mces.mc->header.source_id)
> >> +		return true;
> >> +	if (mces.dmc && mces.dmc->flags & ACPI_HEST_GHES_ASSIST &&
> >> +	    related_source_id == mces.dmc->header.source_id)
> >> +		return true;
> >> +
> >> +	return false;
> >> +}
> >> +
> >>  typedef int (*apei_hest_func_t)(struct acpi_hest_header *hest_hdr, void *data);
> >>  
> >>  static int apei_hest_parse(apei_hest_func_t func, void *data)
> >> @@ -114,6 +160,11 @@ static int apei_hest_parse(apei_hest_func_t func, void *data)
> >>  			return -EINVAL;
> >>  		}
> >>  
> >> +		if (is_ghes_assist_struct(hest_hdr)) {
> >> +			hest_hdr = (void *)hest_hdr + len;
> >> +			continue;
> >> +		}
> >> +
> >>  		rc = func(hest_hdr, data);
> >>  		if (rc)
> >>  			return rc;
> >>
> >> base-commit: 629a3b49f3f957e975253c54846090b8d5ed2e9b
> > 
> 
> -- 
> Thanks,
> Avadhut Naik




[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]
  Powered by Linux