Re: [RFC] sparc64: Meaning of /sys/**/core_siblings on newer platforms.

Julian Calaby <julian.calaby@xxxxxxxxx> · Tue, 7 Jun 2016 10:14:36 +1000

Hi Chris,

On Tue, Jun 7, 2016 at 8:23 AM, chris hyser <chris.hyser@xxxxxxxxxx> wrote:
> Before SPARC T7, the notion of core_siblings was both those CPUs that share
> a
> common highest level cache and the set of CPUs within a particular socket
> (share same package_id). This was also true on older x86 CPUs and perhaps
> most
> recent though my knowledge of x86 is dated.
>
> The idea of same package_id is stated in Documentation/cputopology.txt and
> programs such as lscpu have relied upon this to find the number of sockets
> by
> counting the number of unique core_siblings_list entries. I suspect the
> reliance
> on that algorithm predates the ability to read package IDs directly which is
> simpler, more straightforward and preserves the platform assigned package ID
> versus an ID that is just an incremented index based on order of discovery.
>
> The idea that it needs to represent shared common highest level cache comes
> from irqbalance, an important run-time performance enhancing daemon.
>
> irqbalance uses the following hierarchy of locality goodness:
>
>          - shared common core (thread_siblings)
>          - shared common cache (core_siblings)
>          - shared common socket (CPUs with same physical_package_id)
>          - shared common node (CPUS in same node)
>
> This layout perfectly describes the T7 and interestingly suggests that there
> are
> one or more other architectures that have reached the point where enough
> cores
> can be jammed into the same package that a shared high level cache is either
> not
> desirable or not worth the real estate/effort. Said differently, socket in
> the
> future will likely become less synonymous with shared cache and instead more
> synonymous with node. I'm still digging to see if and what those
> architectures
> are.
>
> The issue is that on newer SPARC HW both definitions can no longer be true
> and
> that choosing one versus the other will break differing sets of code. This
> can
> be illustrated as a choice between an unmodified lscpu spitting out
> nonsensical
> answers (although it currently can do that for different unrelated reasons)
> or
> an unmodified irqbalance incorrectly making cache-thrashing decisions. The
> number of important programs in each class is unknown, but either way some
> things will have to be fixed. As I believe the whole point of large SPARC
> servers is performance and the goal of the people on the SPARC mailing list
> is
> to maximize SPARC linux performance, I would argue for not breaking what I
> would call the performance class of programs versus the topology description
> class.
>
> Rationale:
>
> - performance class breakage is harder to diagnose as it results in lost
> performance and tracing back to root cause is incredibly difficult. Topology
> description programs on the other hand spit out easily identified nonsense
> and can be modified in a manner that is actually more straight forward than
> the current algorithm while preserving architecturally neutral functional
> correctness (i.e. not hacks/workarounds)
>
> Attached is a working sparc64 patch for redefinition in favor of "shared
> highest level cache" (not intended in its current form for actual upstream
> submission but to clarify the proposal and allow actual testing). I'm
> seeking
> feedback on how to proceed here to prevent wasted effort fixing the wrong
> set
> of user land programs and related in-progress patches for SPARC sysfs.
>
> Example results of patch:
>
> Before:
>          [root@ca-sparc30 topology]# cat core_siblings_list
>          32-63,128-223
>
> After:
>          [root@ca-sparc30 topology]# cat core_siblings_list
>          32-63
>
> diff --git a/arch/sparc/kernel/mdesc.c b/arch/sparc/kernel/mdesc.c
> index 1122886..e1b3893 100644
> --- a/arch/sparc/kernel/mdesc.c
> +++ b/arch/sparc/kernel/mdesc.c
>   @@ -597,20 +598,21 @@ static void fill_in_one_cache(cpuinfo_sparc *c,
> struct mdesc_handle *hp, u64 mp)
>                 c->ecache_line_size = *line_size;
>                 break;
>   +     case 3:

Is your patch mangled?

> +               c->l3_cache_size = *size;
> +               c->l3_cache_line_size = *line_size;
> +               break;
> +
>         default:
>                 break;
>         }
>   -     if (*level == 1) {
> -               u64 a;
> -
> -               mdesc_for_each_arc(a, hp, mp, MDESC_ARC_TYPE_FWD) {
> -                       u64 target = mdesc_arc_target(hp, a);
> -                       const char *name = mdesc_node_name(hp, target);
> +       mdesc_for_each_arc(a, hp, mp, MDESC_ARC_TYPE_FWD) {
> +               u64 target = mdesc_arc_target(hp, a);
> +               const char *name = mdesc_node_name(hp, target);
>   -                     if (!strcmp(name, "cache"))
> -                               fill_in_one_cache(c, hp, target);
> -               }
> +               if (!strcmp(name, "cache"))
> +                       fill_in_one_cache(c, hp, target);
>         }
>   }
>   @@ -645,13 +647,19 @@ static void __mark_core_id(struct mdesc_handle *hp,
> u64 node,
>                 cpu_data(*id).core_id = core_id;
>   }
>   -static void __mark_sock_id(struct mdesc_handle *hp, u64 node,
> -                          int sock_id)
> +static void __mark_max_cache_id(struct mdesc_handle *hp, u64 node,
> +                               int max_cache_id)
>   {
>         const u64 *id = mdesc_get_property(hp, node, "id", NULL);
>   -     if (*id < num_possible_cpus())
> -               cpu_data(*id).sock_id = sock_id;
> +       if (*id < num_possible_cpus()) {
> +               cpu_data(*id).max_cache_id = max_cache_id;
> +
> +               /* On systems without explicit socket descriptions, socket
> +                * is max_cache_id
> +                */
> +               cpu_data(*id).sock_id = max_cache_id;
> +       }
>   }
>    static void mark_core_ids(struct mdesc_handle *hp, u64 mp,
> @@ -660,10 +668,11 @@ static void mark_core_ids(struct mdesc_handle *hp, u64
> mp,
>         find_back_node_value(hp, mp, "cpu", __mark_core_id, core_id, 10);
>   }
>   -static void mark_sock_ids(struct mdesc_handle *hp, u64 mp,
> -                         int sock_id)
> +static void mark_max_cache_ids(struct mdesc_handle *hp, u64 mp,
> +                              int max_cache_id)
>   {
> -       find_back_node_value(hp, mp, "cpu", __mark_sock_id, sock_id, 10);
> +       find_back_node_value(hp, mp, "cpu", __mark_max_cache_id,
> +                            max_cache_id, 10);
>   }
>    static void set_core_ids(struct mdesc_handle *hp)
> @@ -694,14 +703,15 @@ static void set_core_ids(struct mdesc_handle *hp)
>         }
>   }
>   -static int set_sock_ids_by_cache(struct mdesc_handle *hp, int level)
> +static int set_max_cache_ids_by_cache(struct mdesc_handle *hp,
> +                                     int level)
>   {
>         u64 mp;
>         int idx = 1;
>         int fnd = 0;
>   -     /* Identify unique sockets by looking for cpus backpointed to by
> -        * shared level n caches.
> +       /* Identify unique highest level of shared cache by looking for cpus
> +        * backpointed to by shared level N caches.
>          */
>         mdesc_for_each_node_by_name(hp, mp, "cache") {
>                 const u64 *cur_lvl;
> @@ -710,7 +720,7 @@ static int set_sock_ids_by_cache(struct mdesc_handle
> *hp, int level)
>                 if (*cur_lvl != level)
>                         continue;
>   -             mark_sock_ids(hp, mp, idx);
> +               mark_max_cache_ids(hp, mp, idx);
>                 idx++;
>                 fnd = 1;
>         }
> @@ -745,15 +755,17 @@ static void set_sock_ids(struct mdesc_handle *hp)
>   {
>         u64 mp;
>   +     /* Find the highest level of shared cache which on pre-T7 is also
> +        * the socket.
> +        */
> +       if (!set_max_cache_ids_by_cache(hp, 3))
> +               set_max_cache_ids_by_cache(hp, 2);
> +
>         /* If machine description exposes sockets data use it.
> -        * Otherwise fallback to use shared L3 or L2 caches.
>          */
>         mp = mdesc_node_by_name(hp, MDESC_NODE_NULL, "sockets");
>         if (mp != MDESC_NODE_NULL)
> -               return set_sock_ids_by_socket(hp, mp);
> -
> -       if (!set_sock_ids_by_cache(hp, 3))
> -               set_sock_ids_by_cache(hp, 2);
> +               set_sock_ids_by_socket(hp, mp);
>   }
>    static void mark_proc_ids(struct mdesc_handle *hp, u64 mp, int proc_id)
> diff --git a/arch/sparc/kernel/smp_64.c b/arch/sparc/kernel/smp_64.c
> index 8a6151a..bbe27a4 100644
> --- a/arch/sparc/kernel/smp_64.c
> +++ b/arch/sparc/kernel/smp_64.c
> @@ -1250,8 +1250,11 @@ void smp_fill_in_sib_core_maps(void)
>         for_each_present_cpu(i)  {
>                 unsigned int j;
>   +             cpumask_clear(&cpu_core_sib_map[i]);
> +
>                 for_each_present_cpu(j)  {
> -                       if (cpu_data(i).sock_id == cpu_data(j).sock_id)
> +                       if (cpu_data(i).max_cache_id ==
> +                           cpu_data(j).max_cache_id)
>                                 cpumask_set_cpu(j, &cpu_core_sib_map[i]);
>                 }
>         }
> --
> To unsubscribe from this list: send the line "unsubscribe sparclinux" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Julian Calaby

Email: julian.calaby@xxxxxxxxx
Profile: http://www.google.com/profiles/julian.calaby/
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html