On Thu, 13 Mar 2025 13:30:58 -0400 Gregory Price <gourry@xxxxxxxxxx> wrote: > On Thu, Mar 13, 2025 at 04:55:39PM +0000, Jonathan Cameron wrote: > > > > Maybe ignore Generic Initiators for this doc. They are relevant for > > CXL but in the fabric they only matter for type 1 / 2 devices not > > memory and only if the BIOS wants to do HMAT for end to end. Gets > > more fun when they are in the host side of the root bridge. > > > > Fair, I wanted to reference the proposals but I personally don't have a > strong understanding of this yet. Dave Jiang mentioned wanting to write > some info on CDAT with some reference to the Generic Port work as well. > > Some help understanding this a little better would be very much > appreciated, but I like your summary below. Noted for updated version. > > > # Generic Port > > > > In the scenario where CXL memory devices are not present at boot, or > > not configured by the BIOS or he BIOS has not provided full HMAT > > descriptions for the configured memory, we may still want to > > generate proximity domain configurations for those devices. > > The Generic Port structures are intended to fill this gap, so > > that performance information can still be utilized when the > > devices are available at runtime by combining host information > > with that discovered from devices. > > > > Or just > > # Generic Ports > > > > These are fun ;) > > > > > > > > > ==== > > > HMAT > > > ==== > > > The Heterogeneous Memory Attributes Table contains information such as > > > cache attributes and bandwidth and latency details for memory proximity > > > domains. For the purpose of this document, we will only discuss the > > > SSLIB entry. > > > > No fun. You miss Intel's extensions to memory-side caches ;) > > (which is wise!) > > > > Yes yes, but I'm trying to be nice. I'm debating on writing the Section > 4 interleave addendum on Zen5 too :P What do they get up to? I've not seen that one yet! May be a case of 'Hold my beer' for these crazies. > > > > ================== > > > NUMA node creation > > > =================== > > > NUMA nodes are *NOT* hot-pluggable. All *POSSIBLE* NUMA nodes are > > > identified at `__init` time, more specifically during `mm_init`. > > > > > > What this means is that the CEDT and SRAT must contain sufficient > > > `proximity domain` information for linux to identify how many NUMA > > > nodes are required (and what memory regions to associate with them). > > > > Is it worth talking about what is effectively a constraint of the spec > > and what is a Linux current constraint? > > > > SRAT is only ACPI defined way of getting Proximity nodes. Linux chooses > > to at most map those 1:1 with NUMA nodes. > > CEDT adds on description of SPA ranges where there might be memory that Linux > > might want to map to 1 or more NUMA nodes > > > > Rather than asking if it's worth talking about, I'll spin that around > and ask what value the distinction adds. The source of the constraint > seems less relevant than "All nodes must be defined during mm_init by > something - be it ACPI or CXL source data". > > Maybe if this turns into a book, it's worth breaking it out for > referential purposes (pointing to each point in each spec). Fair point. It doesn't add much. > > > > > > > Basically, the heuristic is as follows: > > > 1) Add one NUMA node per Proximity Domain described in SRAT > > > > if it contains, memory, CPU or generic initiator. > > > > noted > > > > 2) If the SRAT describes all memory described by all CFMWS > > > - do not create nodes for CFMWS > > > 3) If SRAT does not describe all memory described by CFMWS > > > - create a node for that CFMWS > > > > > > Generally speaking, you will see one NUMA node per Host bridge, unless > > > inter-host-bridge interleave is in use (see Section 4 - Interleave). > > > > I just love corners: QoS concerns might mean multiple CFMWS and hence > > multiple nodes per host bridge (feel free to ignore this one - has > > anyone seen this in the wild yet?) Similar mess for properties such > > as persistence, sharing etc. > > This actually come up as a result of me writing this - this does exist > in the wild and is causing all kinds of fun on the weighted_interleave > functionality. > > I plan to come back and add this as an addendum, but probably not until > after LSF. > > We'll probably want to expand this into a library of case studies that > cover these different choices - in hopes of getting some set of > *suggested* configurations for platform vendors to help play nice with > linux (especially for things that actually consume these blasted nodes). Agreed. We'll be looking back on this in a year or so and thinking, wasn't life nice an simple back then! Jonathan > > ~Gregory >