Re: [ACPI Code First ECN] Enumerate "Inclusive Linear Address Mode" memory-side caches

Dan Williams <dan.j.williams@xxxxxxxxx> · Fri, 17 May 2024 13:20:06 -0700

Jonathan Cameron wrote:
> On Fri, 10 May 2024 16:00:24 -0700
> Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
> 
> > # Title: Enumerate "Inclusive Linear Address Mode" memory-side caches
> 
> So pretty much all my feedback is about using inclusive in anything
> to do with caches.  Term usually means something very different from
> what you have here and it confused me. As an example, consider a dataset
> that fits entirely in CPU  L3 / L2 capacity.
> In that case the situation you describe here looks like an exclusive L2 / L3
> cache (line sits in one or the other but not both).

I clarify in the text that this term is an attribute of the *address-space* not
the cache hierarchy. If HMAT ever needed to described that a multi-level
memory-side cache was Inclusive or Exclusive then I would likely steal bit3 of
Cache Attributes field to enumerate that detail, but it is not clear that detail
matters to any OS mechanism or policy.

> Maybe just describe the problem and skip the exact cause?
> 
> Enumerate "Unrecoverable aliases in direct mapped memory-side caches"

I have read that several times and I can not map that title back to the property
this "address mode" enumeration is trying to describe.

I would prefer to just pile on more explicit clarifications to overcome that
instinct to map the word "Inclusive" to a multi-level cache attribute. Something
like "note, 'Inclusive Linear' address-mode not to be confused with
'Inclusive/Exclusive' multi-level cache organization".

> Whilst the CXL side of things (and I assume your hardware migration engine)
> don't provide a way to recover this, it would be possible to build
> a system that otherwise looked like you describe that did provide access
> to the tag bits and so wouldn't present the aliasing problem.

Aliasing problem? All direct-mapped caches have aliases, it just happens that
this address mode allows direct-addressability of at least one alias.

> > 
> > # Status: Draft v2
> > 
> > # Document: ACPI Specification 6.6
> > 
> > # License
> > SPDX-License Identifier: CC-BY-4.0
> > 
> > # Submitter:
> > * Sponsor: Dan Williams, Intel
> > * Creators/Contributors:
> >     * Andy Rudoff, retired
> >     * Mahesh Natu, Intel
> >     * Ishwar Agarwal, Intel
> > 
> > # Changelog
> > * v2: Clarify the "Inclusive" term as "including the capacity of the cache
> >   in the SRAT range length"
> > * v2: Clarify that 0 is an undeclared / transparent Address Mode, and
> >   that Address Mode values other than 1 are Reserved.
> > 
> > # Summary of the Change
> > Enable an OSPM to enumerate that the capacity for a memory-side cache is
> > "included" in an SRAT range. Typically the "Memory Side Cache Size"
> > enumerated in the HMAT is "excluded" from the SRAT range length because
> > it is a transparent cache of the SRAT capacity. The enumeration of this
> > addressing mode enables OSPM memory RAS (Reliability, Availability, and
> > Serviceability) flows.
> 
> 'excluded' somehow implies it exists as something we might include but
> we don't.  'Not relevant' would be clearer wording I think.

But it is relevant. If the near memory (cache memory) is 64GB and the far memory
(backing store) is 64GB then the SRAT range is 64GB (cache-excluded). With this
new mode the SRAT range is 128GB.

> > Recall that the CXL specification allows for platform address ranges to
> > be interleaved across CXL and non-CXL targets. CXL 3.1 Table 9-22 CFMWS
> > Structure states "If the Interleave Set spans non-CXL domains, this list
> > may contain values that do not match \_UID field in any CHBS structures.
> > These entries represent Interleave Targets that are not CXL Host
> > Bridges". For an OSPM this means address translation needs to be
> > prepared for non-CXL targets. Now consider the case when that CXL
> > address range is flagged as a memory side cache in the ACPI HMAT.
> 
> A CXL address range can be flagged as having a memory-side cache in
> front of it bus as you've state normally wouldn't have separate HPA
> ranges. The interleave stuff doesn't get you to what you describe
> here as it's well defined, not a transparent cache like a
> memory-side cache.  A given cacheline is in a known FRU, not potentially
> multiple ones. Hence I'm not sure this paragraph is particularly useful.

It was an attempt to show precedent for why Linux needs to care about the memory
organization and how CFMWS does not achieve this description. That said, as this
is text that only appears in the justification for the ECN I do not mind
dropping it.

> > Address translation needs to consider that the decode for an error may
> > impact multiple components (FRUs fields replaceable units).
> > 
> > Now consider the implications of ["Flat Memory Mode" (Intel presentation
> > at Hot Chips
> > 2023)](https://cdrdv2-public.intel.com/787386/Hot%20Chips%20-%20Aug%2023%20-%20BHS%20and%20Granite%20Rapid%20-%20Xeon%20-%20Architecture%20-%20Public.pdf).
> 
> Other than telling us someone put it on a slide, that slide provides
> very little useful info!

Hence this write-up in the ECN, felt it was better than nothing to include a
picture for reference.

> > This cache geometry implies an address space that includes the
> > memory-side cache size in the reported address range. For example, a
> > typical address space layout for a memory-side-cache of 32GB of DDR
> > fronting 64GB of CXL would report 64GB in the "Length" field of the
> > SRAT's "Memory Affinity Structure" and 32GB in the "Memory Side Cache
> > Size" field of the HMAT's "Memory Side Cache Information Structure".
> 
> > An
> > inclusive address-space layout of the same configuration would report
> > 96GB in the "Length" field of the SRAT's "Memory Affinity Structure" and
> > 32GB in the "Memory Side Cache Size" field of the HMAT's "Memory Side
> > Cache Information Structure". The implication for address translation in
> > the inclusive case, is that there are N potential aliased address
> > impacted by a memory error where N is the ratio of:
> > 
> > SRAT.MemoryAffinityStructure.Length /
> > HMAT.MemorySideCacheInformation.CacheSize
> 
> So in your example a memory error can affect any of 3 addresses.
> 
> That feels like it is assuming a particular caching strategy without
> expressly stating it. Let us take it to extreme.  Make it a fully
> associative non-inclusive DDR cache (sure that is insane, but bare
> with me). Now any potential problem affects all addresses as a given error
> in the memory-side cache might affect anything - given it's fully associative
> it's also possible an error in the CXL memory might also be any cacheline
> in the system.
> 
> The memory-side cache description does include the option of specifying
> the cache is direct mapped so if that is set your assumed mapping is valid.
> If someone set the 'complex cache indexing' option then I think all bets
> are off. To be useful you should rule that out in your spec change.

Sure, "Linear" implies direct-mapped since fully-set associative is a
non-linear arrangement.

> > This change request is not exclusive to CXL, the concept is applicable
> > to any memory-side-cache configuration that the HMAT+SRAT can describe.
> > However, CXL is a primary motivator given the OSPM role in address
> > translation for device-physical-address (DPA) events being translated to
> > impacted host-physical-address (HPA) events.
> > 
> > # Benefits of the Change
> > An OSPM, when it knows about inclusive cache address space, can take
> > actions like quarantine / offline all the impacted aliased pages to
> > prevent further consumption of poison, or run repair operations on all
> > the affected targets. Without this change an OSPM may not accurately
> > identify the HPA associated with a given CXL FRU event, or it may
> > misunderstand that an SRAT memory affinity range is an amalgam of CXL
> > and cache capacity.
> 
> Could you add a cache attribute to say it's a non-inclusive / exclusive
> cache? That combined with direct-mapped would I think provide the relevant
> indication.  It still runs into the problem that advanced hardware
> could still resolve which alias is the problem. So maybe we are better
> off sticking to describing that fact there is an alias issue for any
> reported errors that cannot be resolved (presumably you can poke the
> the aliased entries and see which one gives poison via synchronous access)

I still disagree with the implication that "inclusion" is a property of the
cache and not the address layout for this ECN.

> Note that I'm not keen on the use of inclusive for your range description
> because that terminology means the exact opposite of what you intend
> when applied to a normal cache! I can't think of a better term though
> but the bikeshed should not be blue.

I am sticking with "include" since cache capacity is included in the SRAT
range, and will move off that term when/if someone comes up with something
better.

[..]
> > 
> > * Extend the implementation note after Table 5.149 to explain how to
> >   interpret the "Inclusive linear" mode.
> > 
> >     * "When Address Mode is 1 'Inclusive Linear' it indicates that there
> >       are N directly addressable aliases of a given cacheline
> >       where N is the ratio of target memory proximity domain size and
> >       the memory side cache size.  Where the N aliased addresses for a
> >       given cacheline all share the same result for the operation
> >       'address modulo cache size'."
> 
> That description is somewhat tighter than the free form one in the intro
> so answered a lot of questions I had before getting this far.

Happy to delete all of the text outside of "Detailed Description of the Change"
since none of it will be included in ACPI spec.