Hi Chris, On Tue, Jun 7, 2016 at 8:23 AM, chris hyser <chris.hyser@xxxxxxxxxx> wrote: > Before SPARC T7, the notion of core_siblings was both those CPUs that share > a > common highest level cache and the set of CPUs within a particular socket > (share same package_id). This was also true on older x86 CPUs and perhaps > most > recent though my knowledge of x86 is dated. > > The idea of same package_id is stated in Documentation/cputopology.txt and > programs such as lscpu have relied upon this to find the number of sockets > by > counting the number of unique core_siblings_list entries. I suspect the > reliance > on that algorithm predates the ability to read package IDs directly which is > simpler, more straightforward and preserves the platform assigned package ID > versus an ID that is just an incremented index based on order of discovery. > > The idea that it needs to represent shared common highest level cache comes > from irqbalance, an important run-time performance enhancing daemon. > > irqbalance uses the following hierarchy of locality goodness: > > - shared common core (thread_siblings) > - shared common cache (core_siblings) > - shared common socket (CPUs with same physical_package_id) > - shared common node (CPUS in same node) > > This layout perfectly describes the T7 and interestingly suggests that there > are > one or more other architectures that have reached the point where enough > cores > can be jammed into the same package that a shared high level cache is either > not > desirable or not worth the real estate/effort. Said differently, socket in > the > future will likely become less synonymous with shared cache and instead more > synonymous with node. I'm still digging to see if and what those > architectures > are. > > The issue is that on newer SPARC HW both definitions can no longer be true > and > that choosing one versus the other will break differing sets of code. This > can > be illustrated as a choice between an unmodified lscpu spitting out > nonsensical > answers (although it currently can do that for different unrelated reasons) > or > an unmodified irqbalance incorrectly making cache-thrashing decisions. The > number of important programs in each class is unknown, but either way some > things will have to be fixed. As I believe the whole point of large SPARC > servers is performance and the goal of the people on the SPARC mailing list > is > to maximize SPARC linux performance, I would argue for not breaking what I > would call the performance class of programs versus the topology description > class. > > Rationale: > > - performance class breakage is harder to diagnose as it results in lost > performance and tracing back to root cause is incredibly difficult. Topology > description programs on the other hand spit out easily identified nonsense > and can be modified in a manner that is actually more straight forward than > the current algorithm while preserving architecturally neutral functional > correctness (i.e. not hacks/workarounds) > > Attached is a working sparc64 patch for redefinition in favor of "shared > highest level cache" (not intended in its current form for actual upstream > submission but to clarify the proposal and allow actual testing). I'm > seeking > feedback on how to proceed here to prevent wasted effort fixing the wrong > set > of user land programs and related in-progress patches for SPARC sysfs. > > Example results of patch: > > Before: > [root@ca-sparc30 topology]# cat core_siblings_list > 32-63,128-223 > > After: > [root@ca-sparc30 topology]# cat core_siblings_list > 32-63 > > diff --git a/arch/sparc/kernel/mdesc.c b/arch/sparc/kernel/mdesc.c > index 1122886..e1b3893 100644 > --- a/arch/sparc/kernel/mdesc.c > +++ b/arch/sparc/kernel/mdesc.c > @@ -597,20 +598,21 @@ static void fill_in_one_cache(cpuinfo_sparc *c, > struct mdesc_handle *hp, u64 mp) > c->ecache_line_size = *line_size; > break; > + case 3: Is your patch mangled? > + c->l3_cache_size = *size; > + c->l3_cache_line_size = *line_size; > + break; > + > default: > break; > } > - if (*level == 1) { > - u64 a; > - > - mdesc_for_each_arc(a, hp, mp, MDESC_ARC_TYPE_FWD) { > - u64 target = mdesc_arc_target(hp, a); > - const char *name = mdesc_node_name(hp, target); > + mdesc_for_each_arc(a, hp, mp, MDESC_ARC_TYPE_FWD) { > + u64 target = mdesc_arc_target(hp, a); > + const char *name = mdesc_node_name(hp, target); > - if (!strcmp(name, "cache")) > - fill_in_one_cache(c, hp, target); > - } > + if (!strcmp(name, "cache")) > + fill_in_one_cache(c, hp, target); > } > } > @@ -645,13 +647,19 @@ static void __mark_core_id(struct mdesc_handle *hp, > u64 node, > cpu_data(*id).core_id = core_id; > } > -static void __mark_sock_id(struct mdesc_handle *hp, u64 node, > - int sock_id) > +static void __mark_max_cache_id(struct mdesc_handle *hp, u64 node, > + int max_cache_id) > { > const u64 *id = mdesc_get_property(hp, node, "id", NULL); > - if (*id < num_possible_cpus()) > - cpu_data(*id).sock_id = sock_id; > + if (*id < num_possible_cpus()) { > + cpu_data(*id).max_cache_id = max_cache_id; > + > + /* On systems without explicit socket descriptions, socket > + * is max_cache_id > + */ > + cpu_data(*id).sock_id = max_cache_id; > + } > } > static void mark_core_ids(struct mdesc_handle *hp, u64 mp, > @@ -660,10 +668,11 @@ static void mark_core_ids(struct mdesc_handle *hp, u64 > mp, > find_back_node_value(hp, mp, "cpu", __mark_core_id, core_id, 10); > } > -static void mark_sock_ids(struct mdesc_handle *hp, u64 mp, > - int sock_id) > +static void mark_max_cache_ids(struct mdesc_handle *hp, u64 mp, > + int max_cache_id) > { > - find_back_node_value(hp, mp, "cpu", __mark_sock_id, sock_id, 10); > + find_back_node_value(hp, mp, "cpu", __mark_max_cache_id, > + max_cache_id, 10); > } > static void set_core_ids(struct mdesc_handle *hp) > @@ -694,14 +703,15 @@ static void set_core_ids(struct mdesc_handle *hp) > } > } > -static int set_sock_ids_by_cache(struct mdesc_handle *hp, int level) > +static int set_max_cache_ids_by_cache(struct mdesc_handle *hp, > + int level) > { > u64 mp; > int idx = 1; > int fnd = 0; > - /* Identify unique sockets by looking for cpus backpointed to by > - * shared level n caches. > + /* Identify unique highest level of shared cache by looking for cpus > + * backpointed to by shared level N caches. > */ > mdesc_for_each_node_by_name(hp, mp, "cache") { > const u64 *cur_lvl; > @@ -710,7 +720,7 @@ static int set_sock_ids_by_cache(struct mdesc_handle > *hp, int level) > if (*cur_lvl != level) > continue; > - mark_sock_ids(hp, mp, idx); > + mark_max_cache_ids(hp, mp, idx); > idx++; > fnd = 1; > } > @@ -745,15 +755,17 @@ static void set_sock_ids(struct mdesc_handle *hp) > { > u64 mp; > + /* Find the highest level of shared cache which on pre-T7 is also > + * the socket. > + */ > + if (!set_max_cache_ids_by_cache(hp, 3)) > + set_max_cache_ids_by_cache(hp, 2); > + > /* If machine description exposes sockets data use it. > - * Otherwise fallback to use shared L3 or L2 caches. > */ > mp = mdesc_node_by_name(hp, MDESC_NODE_NULL, "sockets"); > if (mp != MDESC_NODE_NULL) > - return set_sock_ids_by_socket(hp, mp); > - > - if (!set_sock_ids_by_cache(hp, 3)) > - set_sock_ids_by_cache(hp, 2); > + set_sock_ids_by_socket(hp, mp); > } > static void mark_proc_ids(struct mdesc_handle *hp, u64 mp, int proc_id) > diff --git a/arch/sparc/kernel/smp_64.c b/arch/sparc/kernel/smp_64.c > index 8a6151a..bbe27a4 100644 > --- a/arch/sparc/kernel/smp_64.c > +++ b/arch/sparc/kernel/smp_64.c > @@ -1250,8 +1250,11 @@ void smp_fill_in_sib_core_maps(void) > for_each_present_cpu(i) { > unsigned int j; > + cpumask_clear(&cpu_core_sib_map[i]); > + > for_each_present_cpu(j) { > - if (cpu_data(i).sock_id == cpu_data(j).sock_id) > + if (cpu_data(i).max_cache_id == > + cpu_data(j).max_cache_id) > cpumask_set_cpu(j, &cpu_core_sib_map[i]); > } > } > -- > To unsubscribe from this list: send the line "unsubscribe sparclinux" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Julian Calaby Email: julian.calaby@xxxxxxxxx Profile: http://www.google.com/profiles/julian.calaby/ -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html