Hi James, Thanks for the feedback. >-----Original Message----- >From: James Morse [mailto:james.morse@xxxxxxx] >Sent: 06 November 2020 19:34 >To: Shiju Jose <shiju.jose@xxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx; >bp@xxxxxxxxx; tony.luck@xxxxxxxxx; rjw@xxxxxxxxxxxxx; lenb@xxxxxxxxxx; >rrichter@xxxxxxxxxxx >Cc: linux-edac@xxxxxxxxxxxxxxx; linux-acpi@xxxxxxxxxxxxxxx; Linuxarm ><linuxarm@xxxxxxxxxx>; Jonathan Cameron ><jonathan.cameron@xxxxxxxxxx> >Subject: Re: [RFC PATCH 1/4] ACPI: PPTT: Fix for a high level cache node >detected in the low level > >Hi Shiju, Jonathan, > >On 05/11/2020 17:42, Shiju Jose wrote: >> From: Jonathan Cameron <jonathan.cameron@xxxxxxxxxx> >> >> According to the following sections of the PPTT definition in the ACPI >> specification(V6.3), a high level cache node( For example L2 cache) >> could be represented simultaneously both in the private resource of a >> CPU node and via the next_level_of_cache pointer of a low level cache >> node. >> 1. Section 5.2.29.1 Processor hierarchy node structure (Type 0) "Each >> processor node includes a list of resources that are private to that >> node. Resources are described in other PPTT structures such as Type 1 >> cache structures. The processor node’s private resource list includes >> a reference to each of the structures that represent private resources >> to a given processor node. For example, an SoC level processor node >> might contain two references, one pointing to a Level 3 cache resource >> and another pointing to an ID structure." >> >> 2. Section 5.2.29.2 Cache Type Structure - Type 1 >> Figure 5-26 Cache Type Structure - Type 1 Example > >'fix' in the subject makes me twitch ... is there a user-space visible bug >because of this? > > >> For the use case of creating EDAC device blocks for the CPU caches, we >> need to search for cache node types in all levels using >> acpi_find_cache_node(), as a platform independent solution to > >I'm nervous to base the edac user-space view of caches on something other >than what is described in /sys/devices/system/cpu/cpu0/cache. These things >have to match, otherwise user-space can't work out which cpu's L2's it should >add to get the value for the physical cache. With this fix the /sys/devices/system/edac/cpu/cpu0/cacheN match with /sys/devices/system/cpu/cpu0/cache/indexN. and thus user-space could extract cpu list for the shared caches. > >Getting the data from somewhere else risks making this more complicated. > >Using the PPTT means this won't work on "HPE Server"s that use ghes_edac >too. I don't think we should have any arm64 specific behaviour here. > > >> retrieve the cache info from the ACPI PPTT. The reason is that >> cacheinfo in the drivers/base/cacheinfo.c would not be populated in >> this stage. > >Because both ghes_init() and cacheinfo_sysfs_init() are device_initcall()? > >Couldn't we fix this by making ghes_init(), device_initcall_sync() (with a >comment saying what it depends on) I checked by making ghes_init(), device_initcall_sync(). Then the ghes_probe() and ghes_edac_register() are getting called after detect_cache_attributes() of base/cacheinfo.c function is called, where per_cpu_cacheinfo is allocated and populated for the cpu online. Thus cacheinfo data is available for the online CPUs. > > >I agree this means dealing with cpuhp as the cacheinfo data is only available >for online CPUs. Does this require the EDAC device instance for a cpu become online/offline to be added/deleted on cpuhp notify functions in the ghes_edac because the cache structure among CPU cores would vary? If so, I think edac device does not support dynamic addition/deletion of a device instance because edac_device_alloc_ctl_info() pre-allocates memory for the internal edac dev structures for the number of instances(number of CPUs) and number of blocks(number of Caches) passed? > > >> In this case, we found acpi_find_cache_node() mistakenly detecting >> high level cache as low level cache, when the cache node is in the >> processor node’s private resource list. >> >> To fix this issue add duplication check in the >> acpi_find_cache_level(), for a cache node found in the private >> resource of a CPU node with all the next level of caches present in the other >cache nodes. > >I'm not overly familiar with the PPTT, is it possible this issue is visible in >/sys/devices/system/cpu/cpu0/cache? > > >Thanks, > >James > > >> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c index >> 4ae93350b70d..de1dd605d3ad 100644 >> --- a/drivers/acpi/pptt.c >> +++ b/drivers/acpi/pptt.c >> @@ -132,21 +132,80 @@ static unsigned int acpi_pptt_walk_cache(struct >acpi_table_header *table_hdr, >> return local_level; >> } >> >> +/** >> + * acpi_pptt_walk_check_duplicate() - Find the cache resource to >> +check, >> + * is a duplication in the next_level_of_cache pointer of other cache. >> + * @table_hdr: Pointer to the head of the PPTT table >> + * @res: cache resource in the PPTT we want to walk >> + * @res_check: cache resource in the PPTT we want to check for >duplication. >> + * >> + * Given both PPTT resource, verify that they are cache nodes, then >> +walk >> + * down each level of cache @res, and check for the duplication. >> + * >> + * Return: true if duplication found, false otherwise. >> + */ >> +static bool acpi_pptt_walk_check_duplicate(struct acpi_table_header >*table_hdr, >> + struct acpi_subtable_header *res, >> + struct acpi_subtable_header >*res_check) { >> + struct acpi_pptt_cache *cache; >> + struct acpi_pptt_cache *check; >> + >> + if (res->type != ACPI_PPTT_TYPE_CACHE || >> + res_check->type != ACPI_PPTT_TYPE_CACHE) >> + return false; >> + >> + cache = (struct acpi_pptt_cache *)res; >> + check = (struct acpi_pptt_cache *)res_check; >> + while (cache) { >> + if (cache == check) >> + return true; >> + cache = fetch_pptt_cache(table_hdr, cache- >>next_level_of_cache); >> + } >> + >> + return false; >> +} >> + >> static struct acpi_pptt_cache * >> acpi_find_cache_level(struct acpi_table_header *table_hdr, >> struct acpi_pptt_processor *cpu_node, >> unsigned int *starting_level, unsigned int level, >> int type) >> { >> - struct acpi_subtable_header *res; >> + struct acpi_subtable_header *res, *res2; >> unsigned int number_of_levels = *starting_level; >> int resource = 0; >> + int resource2 = 0; >> + bool duplicate = false; >> struct acpi_pptt_cache *ret = NULL; >> unsigned int local_level; >> >> /* walk down from processor node */ >> while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, >resource))) { >> resource++; >> + /* >> + * PPTT definition in the ACPI specification allows a high level >cache >> + * node would be represented simultaneously both in the >private resource >> + * of a CPU node and via the next_level_of_cache pointer of >another cache node, >> + * within the same CPU hierarchy. This resulting >acpi_find_cache_level() >> + * mistakenly detects a higher level cache node in the low >level as well. >> + * >> + * Check a cache node in the private resource of the CPU node >for any >> + * duplication. >> + */ >> + resource2 = 0; >> + duplicate = false; >> + while ((res2 = acpi_get_pptt_resource(table_hdr, cpu_node, >resource2))) { >> + resource2++; >> + if (res2 == res) >> + continue; >> + if (acpi_pptt_walk_check_duplicate(table_hdr, res2, >res)) { >> + duplicate = true; >> + break; >> + } >> + } >> + if (duplicate) >> + continue; >> >> local_level = acpi_pptt_walk_cache(table_hdr, >*starting_level, >> res, &ret, level, type); >> Thanks, Shiju