Re: [RFC PATCH v1] Chapter 3: Add shared L1 Cache

Rob Herring <robh@xxxxxxxxxx> · Mon, 3 Feb 2025 10:44:39 -0600

On Mon, Feb 3, 2025 at 9:45 AM Alireza Sanaee <alireza.sanaee@xxxxxxxxxx> wrote:
>
> On Mon, 3 Feb 2025 08:47:50 -0600
> Rob Herring <robh@xxxxxxxxxx> wrote:
>
> > On Mon, Feb 3, 2025 at 6:05 AM Alireza Sanaee
> > <alireza.sanaee@xxxxxxxxxx> wrote:
> > >
> > > For L1 cache to be shared between SMT threads, a register array
> > > must be used. This, however, is not straightforward if every node
> > > in the CPU map refers to a separate CPU node. Therefore, it is
> > > suggested to create a separate CPU node for every SMT thread. The
> > > L1 cache can be shared if an extra node represents it.
> >
> > I don't understand why a cpu-map is a problem for the SMT case?
> >
> > I don't think this change is necessary.
> >
> > Rob
>
> Hi Rob,
>
> I posted the following patch, which uses a reg array to represent
> threads, allowing threads to share resources within a CPU
> node using reg array and without requiring an extra l1-cache layer:
> https://lore.kernel.org/all/20250110161057.445-1-alireza.sanaee@xxxxxxxxxx/
>
> From Mark's remarks in the same patch, I learned that cpu-map object in
> the dt will need each thread to point to a CPU node entry in
> particular, (Documentation/devicetree/bindings/cpu/cpu-topology.txt). If
> I use the reg array, each thread in the CPU map will not be able to
> point to the corresponding CPU node as they are in the reg array.
>
> You might argue that CPU maps should also be able to be built based on
> the threads in the reg array, and I actually agree with that. Maybe
> that's something I should go about in that case.

The CPU binding in the spec predates cpu-topology.txt. Yes, that was
originally written for PowerPC, but there's really no good reason for
other architectures to deviate. L1 caches are not the only thing
shared. There's clocks, power-domains, OPPs, etc. The CPU node parsing
functions (e.g. of_get_cpu_node()) are also already designed for
threads to share a CPU node. IMO, we should follow the spec.

For cpu-map, there's 2 choices if there is 1 CPU node for all shared threads:
- Don't describe threads in the map and use 'reg' to get any thread info.
- Make shared threads in the map point to the same CPU node.

The latter option could be:

core0 {
  thread0 {
    cpu = <&cpu0>;
  };
  thread1 {
    cpu = <&cpu0>;
  };
};

Or:

core0 {
  thread0 {
    cpu = <&cpu0 0 0>;
  };
  thread1 {
    cpu = <&cpu0 0 1>;
  };
};

Where "0 0" and "0 1" match the 'reg' address of the thread.

Rob